erated data and real images, which show that the proposed scheme outperforms well-known ... Index Termsâ compressed sensing, best-first search, A* search ...
COMPRESSED SENSING SIGNAL RECOVERY VIA A* ORTHOGONAL MATCHING PURSUIT Nazim Burak Karahanoglua,b , Hakan Erdoganb a
b
Information Technologies Institute, TUBITAK-BILGEM, Kocaeli, Turkey Department of Electronics Engineering, Sabanci University, Istanbul, Turkey ABSTRACT
Reconstruction of sparse signals acquired in reduced dimensions requires the solution with minimum 0 norm. As solving the 0 minimization directly is unpractical, a number of algorithms have appeared for finding an indirect solution. A semi-greedy approach, A* Orthogonal Matching Pursuit (A*OMP), is proposed in [1] where the solution is searched on several paths of a search tree. Paths of the tree are evaluated and extended according to some cost function, for which novel dynamic auxiliary cost functions are suggested. This paper describes the A*OMP algorithm and the proposed cost functions briefly. The novel dynamic auxiliary cost functions are shown to provide improved results as compared to a conventional choice. Reconstruction performance is illustrated on both synthetically generated data and real images, which show that the proposed scheme outperforms well-known CS reconstruction methods. Index Terms— compressed sensing, best-first search, A* search, orthogonal matching pursuit, auxiliary functions for A* search
search [9, 10], a best-first search technique, that employs OMP at each branch of the search tree. Incorporating best-first search, multiple paths can be evaluated during the search. Utilization of novel dynamic cost functions provide appropriate selection of branches and improve reconstruction. Results in [1] indicate that the introduced method provides better reconstruction than BP, SP and OMP in most cases. The multi-path strategy reduces the error effectively by decreasing the number of misidentified components even when there is no exact reconstruction. The new algorithm performs quite better than its competitors for natural images as well. In this work, we summarize the findings of [1]. OMP is introduced briefly in the next section. A*OMP is discussed in Section 3. Before concluding, the reconstruction performance will be demonstrated for synthetically generated data and images in comparison to BP, SP and OMP. For a more detailed discussion about A*OMP, readers are referred to [1], which has been submitted to Digital Signal Processing Journal, and was in the review process by the time of writing this work. [1] is available at http://arxiv.org/abs/1009.0396. 2. ORTHOGONAL MATCHING PURSUIT
1. INTRODUCTION Compressed sensing (CS) deals with acquisition of the sparse signals directly in reduced dimensions. It is not possible, however, to obtain the signal directly back from this set of reduced observations. To illustrate, let us consider a K-sparse signal x of length N and model the observation as y = Φx where Φ is the observation matrix of size M × N . As M < N , it is not possible to obtain x directly from y. Instead, the problem is formulated as recovering the sparsest (or equivalently minimum 0 norm) solution: x = arg min x0 s.t. y = Φx.
(1)
Direct solution of (1) is computationally intractable [2]. Recently, a number of algorithms have emerged for finding a solution by exploiting sparsity of x. An overview of these can be found in [3]. Convex relaxation replaces the 0 minimization in (1) with 1 minimization, making the solution possible via computationally tractable convex optimization algorithms such as linear programming, which is proposed by Basis Pursuit (BP) [4]. Greedy pursuit algorithms, on the other hand, employ iterative search mechanisms. Orthogonal Matching Pursuit (OMP) [5] and Regularized Orthogonal Matching Pursuit (ROMP) [6] try sequential recovery by adding one or more nonzero entries at each iteration. Compressive Sampling Matching Pursuit (CoSaMP) [7] and Subspace Pursuit (SP) [8] aim at refining the representation by iterative modification of x. The authors of this paper have recently suggested a tree-based semi-greedy approach, A* Orthogonal Matching Pursuit (A*OMP) [1]. This method solves the CS reconstruction problem with A*
978-1-4577-0539-7/11/$26.00 ©2011 IEEE
3732
Let’s consider the observation y = Φx for a K-sparse x: Only K out of N columns of Φ contribute to y. Greedy pursuit algorithms employ iterative search to identify these K components. Matching Pursuit (MP) [5], which is the first greedy pursuit algorithm in the literature, aims at recovering one nonzero entry of x at each iteration. OMP adds an orthogonalization step at each iteration that improves performance in overcomplete dictionaries [5]. OMP starts with an empty set S for keeping selected columns of Φ. At each iteration, the algorithm determines the column of Φ that has the maximum absolute inner-product, i.e. correlation, with the residue r, which is the approximation error of y, and adds this column vector to the set S. Once S is updated, y ˆ, approximation of y, is computed via an orthogonal projection of y onto S and residue r is updated as r = y − y ˆ. The vectors in S are not orthogonal to each other and orthogonal projection ensures that the residue is orthogonal to S. Hence, this step is an extremely important extension to MP. After K iterations, K columns of Φ are identified in S, and the residue should be 0, or very small if K components are correctly identified. The vectors in S point out the locations of nonzero entries in x, and the values of these nonzero entries are taken from the orthogonal projection of y onto S. 3. A* ORTHOGONAL MATCHING PURSUIT Some CS reconstruction algorithms such as BP, SP, ROMP, CoSaMP, etc. are guaranteed to provide exact reconstruction when the restricted isometry property (RIP) [2, 11] is satisfied. However, the
ICASSP 2011
number of random observations required for satisfying RIP increases with the sparsity K. That is, when number of observations is kept constant, probability of exact reconstruction decreases with increasing K. The multi-path strategy of A*OMP improves reconstruction especially at sparsity levels where the exact reconstruction probability of a single-path algorithm (or BP) starts falling. Searching through a number of candidates reduces reconstruction error by getting rid of many errors into which a single path would fall. A*OMP algorithm takes advantage of a multi-path strategy by performing A* search and employing an OMP-like algorithm at each branch, with the aim of identifying the best sparse representation among the branches. Details of this technique are explained in [1], and in this work we provide only a summary because of space limitations. A* search [9, 10] can be casted into our purposes by representing column vectors of the observation matrix Φ by nodes of a search tree. The aim of the search is to iteratively find the path, i.e. the set of column vectors that minimize 2 norm of the residue. The algorithm can be outlined as follows: The search tree is initialized to contain I initial paths, each having a single node. At each iteration, the algorithm selects the most promising one among the paths in the search tree and expands the best B branches of it. After each iteration, residue and cost of the updated paths are calculated and stored for the next iteration to find the most promising path. The algorithm is terminated when the most promising path already has the desired length K, i.e. no other paths promise better result. In order to understand the algorithm better and address pruning issues, we mention three stages of the algorithm below: initialization, selection of the best path and expansion of it. 3.1. Initialization of the Search Tree A* search originally initializes the search tree by adding all single node paths. Having N single node paths is, however, unpractical when N is large. In fact, most of these paths N vectors are irrelevant to y as K is usually much smaller than N . Moreover, we can use the advantage of removing permutations of a single set of nodes since the order of nodes in a path is not important. Hence, we start with only I K initial paths that include vectors having highest absolute inner-product with y [1]. The results in Section 4 indicate that this choice is indeed appropriate. 3.2. Selection of the Best Path Deciding the most promising path on the search tree is not trivial mostly for one reason: The paths on the tree are allowed to have different lengths. In order to compare costs of selecting paths with different lengths, A* search incorporates an auxiliary function [10] into the cost function. In our problem, the evaluation criterion for each path is the 2 norm of the residue, r2 , and the cost model should reflect the decrease in r2 that would occur if the path were complete, i.e. of length K. In [1], we propose three different cost structures. The additive cost model employs a static auxiliary function whereas two novel cost models, namely adaptive and multiplicative, incorporate dynamic cost functions. In the rest of this manuscript, A*OMP algorithms employing additive, adaptive and multiplicative cost models will be abbreviated as Add-A*OMP, Adap-A*OMP and Mul-A*OMP, respectively. Additive cost model assumes that each of the K vectors in the representation make equal contributions to y2 . We define the additive cost function for a path Si of length l as (K − l) Fadd (Sl ) = rl − β y2 . K 2
(2)
3733
where rl is the residue obtained after orthogonal projection of y onto the l selected vectors and the rightmost term is the auxiliary function. β is a constant which should be greater than 1. Here, β discriminates between long and short paths. If β is large, shorter paths are extended while smaller β favors longer paths. Adaptive cost model employs an auxiliary function that is based on the decrease of the residue by the addition of the last node to the path. As selection by maximum correlation canalizes A*OMP to select paths with the order of decreasing contributions to y, the decay in the residue will generally be smaller in later nodes of a path. Exploiting this, the adaptive cost function is defined as l Fadap (Sli ) = rli − β(rl−1 (3) − ri )(K − l). i 2
2
2
where the subscript i denotes the path for which the cost function is computed. β behaves similar to additive cost model. Multiplicative cost function imposes weighting of the evaluation function where we assume that addition of one vector to the representation reduces the residual error by a constant ratio, α. We model this by multiplication of rli 2 with a weight as (4) Fmul (Sli ) = αK−l g(Sli ) = αK−l rli . 2
α ∈ (0, 1] defines how fast the cost function decays. As the cost function governs the path selection process, choosing it appropriately plays an important role in reconstruction. Because they incorporate actual knowledge about the path, dynamic cost functions provide a better modeling of the residue and improve performance in terms of not only reconstruction rates, but also search time. The results in Section 4 clearly favor dynamic cost models. 3.3. Expanding the Selected Path Typical A* search adds all possible extensions of the selected path to the stack, which results in too many search paths when N is large, increasing both memory and computational power requirements. Assuming K N the number of search paths in the stack may increase up to N K [1]. In order to limit the number of search paths, we consider three pruning strategies: (i) Extensions per Path Pruning: As usually K N , most of the possible extensions of the selected path are irrelevant to the signal y. Additionally, permutations of nodes within a path are equivalent. We exploit these facts by adding the tree only the best B extensions per selected path. These B extensions are chosen as the vectors that have the highest absolute inner-products with the residue. This strategy reduces the maximum number of paths in the tree from N K to B K , where B N . (ii) Stack Size Pruning: Despite extensions per path pruning, the maximum number of search paths can still go very high for many applications. Therefore A*OMP also limits the number of paths in the stack by P . If the number of paths in the tree exceeds P after an iteration, paths with highest costs are removed until P paths remain. This strategy is known as the “beam search” as well. (iii) Equivalent Path Pruning: As permutations of nodes in a path are equivalent, avoiding these reduces growth of the search stack. Equivalent path pruning compares each new path to the paths in the tree, and ignores it if an equivalent path already exists. Path equivalency is defined as follows [1]: Let S1 and S2 be two paths with lengths l1 and l2 , respectively, where l1 ≥ l2 . S1 and S2 are equivalent if and only if S1 contains all the vectors in S2 and these vectors are selected at the first l2 iterations. Then, orthogonal projection of y onto the same set ensures that the residue of S2 is equal to
0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 10
Average NMSE Mul. A*OMP SP BP OMP
20
30 35 40 45 50 K
Exact Reconstruction Rate
1 0.8 Exact Rec. Rate
NMSE
the residue obtained after the first l2 nodes of S1 . Hence, extensions of S2 have already been considered and S2 can be safely neglected. Based on these pruning strategies, extension of a selected path is done as follows: B column vectors of Φ having highest innerproducts with the residue of the path are selected. Each selected path is compared to the paths in the tree, and is added to the tree if and only if no equivalent path is found. For each new path, the residue is computed by orthogonal projection of y onto the set of selected vectors and stored for the next iteration together with the cost. Finally, if the number of paths in the tree exceeds P , paths with highest costs are removed until P paths remain.
0.6 0.4 0.2 0
10
Mul. A*OMP SP BP OMP 20 30 35 40 45 50 K
Fig. 2. Reconstruction results for sparse signals with normal distributed coefficients.
4. RESULTS In this section, we demonstrate the reconstruction performance of A*OMP in comparison to BP, SP and OMP. Unless explicitly stated, tests share the following setup: A*OMP parameters were selected as I = 3, B = 2, P = 200, β = 1.25 and α = 0.5. Entries of the observation matrix were modeled as iid Gaussian random variables with mean 0 and standard deviation 1/N . First task we consider is the reconstruction of K-sparse signals of length N = 256 for K between 5 and 50. For each K, we generated 500 K-sparse vectors whose nonzero entries were drawn from the uniform distribution U [−1, 1]. M = 100 random observations were taken per vector. For each test sample, a different observation matrix was employed to generalize the performance. Reconstruction accuracy was obtained as average normalized mean squared error (NMSE) over 500 samples and exact reconstruction rates.
0.8 NMSE
0.6
Average NMSE over Sparsity Adap. A*OMP Add. A*OMP Mul. A*OMP SP BP OMP
0.4
10
α = arg min α0 s.t. y = Φx = ΦΨα = Vα.
Exact Reconstruction Rate over Sparsity
0.8 0.6 OMP BP SP Mul. A*OMP Add. A*OMP Adap. A*OMP 0 10 20 30 35 40 45 50 K
0.4 0.2
0.2 0
1 Exact Rec. Rate
1
20
30 35 40 45 50 K
results Fig. 2 show that Mul-A*OMP provides both the lowest average NMSE and highest exact reconstruction rate. SP yields the second best exact reconstruction rate but poor NMSE as it almost completely fails when the reconstruction is not exact. In [1], reconstruction results for sparse binary signals are also pronounced. This case is known as more difficult for greedy methods than for 1 minimization. The results follow this observation, where BP yields the best reconstruction, and A*OMP is the second best. Finally, we test A*OMP on images. In this case, the images do not need to be sparse. Natural images have nearly sparse representations in some bases and we exploit the sparse representation of images in the 2D Haar Wavelet Basis, which we denote by Ψ. To illustrate, let α be the sparse representation of x in Ψ, i.e. x = Ψα. Replacing this into (1) yields the new reconstruction problem
Fig. 1. Reconstruction results for sparse signals with uniform distributed coefficients. The results of these simulations are depicted in Fig. 1. These indicate superiority of A*OMP with dynamic cost functions: AdapA*OMP and Mul-A*OMP clearly outperform other algorithms, including BP which performs 1 minimization. All A*OMP variants provide lower NMSE than others. As for the recognition rates, the drastic fall of Add-A*OMP shows the superiority of the proposed dynamic cost functions. Despite its high NMSE, SP competes with Adap-A*OMP and Mul-A*OMP in exact reconstruction rates, even exceeding them slightly at K = 30. This is investigated in [1]: For K = 30, when exact reconstruction cannot be obtained, SP misidentifies nearly half of the nonzero entries of x, while Mul-A*OMP misses only 1 or 2. Similarly, error pdf’s presented in [1] indicate that MSE of Mul-A*OMP lie very close to 0, while that of SP range up to 0.8, with mean about 0.3. Clearly, once SP fails, the amount of error introduced is much higher than A*OMP. Consequently, AdapA*OMP and Mul-A*OMP are superior to SP, also providing lower reconstruction errors in case of non-exact reconstruction. The second test case employs the same setup as above, except drawing non-zero entries of sparse vectors from the normal distribution with mean 0 and standard deviation 1. The reconstruction
3734
(5)
where V = ΦΨ is called the holographic basis. According to (5), the reconstruction algorithm should decide among columns of V to find α. Then x is reconstructed as x = Ψα. [1] explains this in details. We slightly modify the algorithm due to the nature of images: First, we know a priori that the coefficient for the DC component of the basis will generally dominate others, or will be at least nonzero, for natural images. Then, we can safely select the DC component a priori for each path. The initialization is modified to select two vectors for each initial path, one of them being the DC component. Second, images are processed in 8 × 8 blocks. Block processing reduces the complexity and memory requirements by decomposing the problem into much simplified smaller subproblems, and decreasing both the number and length of paths involved in the search [1]. The simulations were performed with five well-known 512 × 512 grayscale images, ’Lena’, ’Tracy’, ’pirate’, ’cameraman’ and ’mandrill’. Images were first preprocessed such that each 8×8 block is K-sparse in the 2D Haar Wavelet basis, where K = 14, and all results were obtained using these sparse images. From each block, a measurement of length M = 32 was computed using a random Gaussian observation matrix. Mul-A*OMP and Adap-A*OMP were run for both B = 2 and B = 3. β is selected as 1.5. Table 1 lists the obtained peak Signal-to-Noise ratio (PSNR) values. Regarding these, A*OMP outperforms BP, SP and OMP. Increasing B from 2 to 3 further improves the reconstruction. A*OMP increases PSNR values up to 8.5 dB and on average about 6.3 dB in comparison to BP. Fig. 3 depicts ’lena’ and reconstructed images using SP, BP and Adap-A*OMP with B = 3. SP reconstruction is the worst. The blocks for which SP fails are clearly visible. Adap-A*OMP performs better than BP, especially around the boundaries and in detailed regions. To visualize the difference, absolute error per pixel
Lena Tracy Pirate Cameraman Mandrill
Table 1. PSNR values for images reconstructed using different reconstruction algorithms Mul-A*OMP Adap-A*OMP BP OMP SP B=2 B=3 B=2 B=3 27.5 23.6 21.5 30.2 33.3 29.5 33.4 34.6 30.8 27.9 38 42.5 37.9 41.6 25.7 21.7 19.3 27.5 30.5 27.4 29.1 28.4 24.7 22.5 32.6 36.9 31.9 35.6 22.3 18.4 16.1 24.1 26.7 23.5 26.8
Original Image
BP Reconstruction
Adap. A*OMP Reconstruction Error
SP Reconstruction
150
BP Reconstruction Error
150
100
100
50
50
0
0
Fig. 4. Reconstruction error per pixel of ’Lena’ for BP and AdapA*OMP with B = 3.
Adap. A*OMP Reconstruction
Fig. 3. Reconstructions of ’Lena’ using different algorithms.
for BP and Adap-A*OMP reconstructions is illustrated in Fig. 4. For BP, errors are concentrated around boundaries and detailed regions while Adap-A*OMP clearly produces less distortion. 5. CONCLUSIONS A*OMP suggests a multi-path search strategy that combines bestfirst search with OMP to solve CS reconstruction problem. The search process favors the paths that minimize the cost function. For appropriate path length compensation, we define two novel dynamic cost functions, namely multiplicative and adaptive, in addition to the static additive cost function. Reconstruction performance of A*OMP is illustrated on both generated data and images. The results state that A*OMP outperforms BP, SP and OMP algorithms when dynamic cost functions are employed. This reconstruction performance of A*OMP indicates that utilization of A* search in CS reconstruction is a promising approach, that significantly reduces the reconstruction errors. 6. REFERENCES [1] N. B. Karahanoglu and H. Erdogan, “A* orthogonal matching pursuit: Best-first search for compressed sensing signal recov-
3735
ery,” Digital Signal Processing, submitted for publication. [2] E. Cand`es and T. Tao, “Decoding by linear programming,” IEEE Trans. Inf. Theory, vol. 51, no. 12, pp. 4203–4215, Dec. 2005. [3] J. A. Tropp and S. J. Wright, “Computational methods for sparse solution of linear inverse problems,” in Proceedings of the IEEE, Los Alamitos, CA, June 2010, vol. 98, pp. 40–44. [4] S. Chen, D. Donoho, and M. Saunders, “Atomic decomposition by basis pursuit,” SIAM J. on Sci. Comp., vol. 20, no. 1, pp. 33–61, 1998. [5] Y. C. Pati, R. Rezaiifar, and P. S. Krishnaprasad, “Orthogonal matching pursuit: Recursive bfunction approximation with applications to wavelet decomposition,” in Proc. 27th Asilomar Conference on Signals, Systems and Computers, Los Alamitos, CA, 1993, vol. 1, pp. 40–44. [6] D. Needell and R. Vershynin, “Uniform uncertainty principle and signal recovery via regularized orthogonal matching pursuit,” Found Comput Math, vol. 9, no. 3, pp. 317–334, June 2009. [7] D. Needell and J. A. Tropp, “CoSaMP: Iterative signal recovery from incomplete and inaccurate samples,” Appl. Comp. Harmonic Anal., vol. 26, pp. 301–321, 2008. [8] W. Dai and O. Milenkovic, “Subspace pursuit for compressive sensing signal reconstruction,” IEEE Trans. Inf. Theory, vol. 55, no. 5, pp. 2230–2249, May 2009. [9] P. E. Hart, N. J. Nilsson, and B. Raphael, “A formal basis for the heuristic determination of minimum cost paths,” IEEE Trans. Syst. Sci. Cybern., vol. 4, no. 12, pp. 100–107, July 1968. [10] F. Jelinek, Statistical Methods For Speech Recognition, chapter 6, MIT Press, Cambridge, MA, USA, 1997. [11] E. Cand`es and T. Tao, “Near-optimal signal recovery from random projections: universal encoding strategies?,” IEEE Trans. Inf. Theory, vol. 52, no. 12, pp. 5406–5425, 2006.