http://www.stanford.edu/Ëboyd/reports/l1 ls.pdf. [7] D. P. Bertsekas, Nonlinear Programming, Athena Scientific,. Belmont, Massachusetts, USA, 2 edition, 1995.
COMPRESSED SENSING – A LOOK BEYOND LINEAR PROGRAMMING Christian R. Berger, Javier Areta, Krishna Pattipati and Peter Willett Department of Electrical and Computer Engineering, University of Connecticut, Storrs, CT 06269 ABSTRACT Recently, significant attention in compressed sensing has been focused on Basis Pursuit, exchanging the cardinality operator with the l1 -norm, which leads to a linear formulation. Here, we want to look beyond using the l1 -norm in two ways: investigating non-linear solutions of higher complexity, but closer to the original problem for one, and improving known low complexity solutions based on Matching Pursuit using rollout concepts. Our simulation results concur with previous findings that once x is “sparse enough”, many algorithms find the correct solution, but for averagely sparse problems we find that the l1 -norm often does not converge to the correct solution – in fact being outperformed by Matching Pursuit based algorithms at lower complexity. The non-linear algorithm we suggest has increased complexity, but shows superior performance in this setting. Index Terms— Compressed sensing, sparse estimation, non-linear programming, rollout. 1. INTRODUCTION In sparse estimation, a signal x ∈ Cn is estimated from a limited set of measurements. These measurements are linear projections of the form y = Ax ∈ Cm , onto a known set of, generally dependent, vectors in A. The interesting aspect of this problem appears when m < n and one is interested in the minimum number of measurements necessary to reconstruct the original signal. This problem only has a well-defined solution, when we can assume the signal x to be sparse in nature, i.e., the number of non-zero elements of x, k = card(x), is small – necessarily smaller than the number of measurements. Since we can’t know which elements of x are non-zero, this generally leads to an intractable combinatorial optimization problem. Different versions of Matching Pursuit had been applied in signal processing as greedy solutions, see e.g. [1, 2]; lately significant research has been focused on Basis Pursuit, see e.g. [3, 4, 5] and references therein. The new focus of this research, besides feasible complexity solutions, is the theoretical relationship between the minimum number of needed measurements, the sparseness of the solution and the properties of the measurement matrix A. The interest in Basis PurSupported by the Office of Naval Research grant N00014-07-1-0429.
suit was triggered by the observation that replacing the explicit sparseness constraint on x with its l1 -norm often leads to the same solution. This in turn enables a convex problem formulation, which can be solved efficiently using interior point methods, see e.g. [6]. Although there are some established results characterizing when the true sparseness constraint can be replaced with the l1 -norm [3], they are not tight in the sense that in many cases not covered by the cited results, the solution can still be found. Also interestingly, the condition developed in [3] applies equally to greedy algorithms (Orthogonal Matching Pursuit) as well as to the l1 -norm formulation. This gives rise to the impression that once the problem is “sparse-enough”, the solution is fairly robust and can be found by different algorithms. We in this work want to focus on algorithms that push the frontier of solvable problems, i.e., we are interested in algorithms that can find the optimal solution with a minimum number of measurements for only averagely sparse problems. We want to compare different algorithms, reducing the number of measurements and/or the degree of sparseness in x, to find the threshold when each algorithm “breaks” from the optimal solution. In this context, we are interested in two types of algorithms: 1. We want to find a problem formulation, which replaces the sparseness card(x) with an approximation different from the l1 -norm, leading to a closer approximation of the original problem. 2. We want to improve existing greedy algorithms based on Matching Pursuit. The rest of this paper is organized as follows: in Section 2, we present our new solutions; in Section 3, we discuss implementation and numerical results. Finally, we conclude the paper in Section 4. 2. PROPOSED SOLUTIONS 2.1. Problem Formulation We want to solve the following problem ˆ = arg min card(x) x x
subj. to |Ax − y|2 ≤
(1)
where x ∈ Cn , y ∈ Cm , A ∈ Cm×n and m < n. The uniqueness of the solution is connected to k := card(x) n, i.e., the sparseness condition on x.
3
λ=1 λ=2 λ = −2 λ = −1
2
2.2. Implementation via Lagrangian Relaxation
i=1
is non-differentiable. The interpretation of the cardinality operator as a “zero-norm” is one approach, lim |x|p = lim
p→0
p→0
n X
p
|xi | = card(x),
(3)
i=1
ˆ = arg min x
n X
zi
(4)
subj. to xi (1 − zi ) = 0 ∀i
(5)
x,z
i=1
|Ax − y|2 <
(6)
where zi ∈ {0, 1}. Applying Lagrangian relaxation, we get
x,z
"
= min min x
z
n X
i=1 n X
zi +
n X
−1
−2
−3 −3
−2
−1
0
1
2
3
Fig. 1. The plot of min{1, λx} shows that with concurrent maximization over λ, this objective function is equivalent to the cardinality operator. 2.3. Approximating the Cardinality Operator The objective function in (9) is still not completely differentiable and we have to update a vector of Lagrangian multipliers λ. Instead we suggest a continuously differentiable approximation in the form of the hyperbolic tangent,
λi xi (1 − zi )
Jc (x) =
i=1
!
zi (1 − λi xi )
i=1
+
n X i=1
λi xi . (7)
(8)
i=1
= min x
n X
min {1, λi xi } (9)
i=1
where now we have an additional maximization over λ. Examining the plot of this objective function in Fig 1, we can note several things: • If we maximize over λ, this function is equivalent to the cardinality operator. • For λ = sign(x), the l1 -norm is the convex extension of this function.
2 tanh c |xi | .
(10)
In Fig. 2 we can see that as c → ∞, this function converges to the cardinality operator. Use of the new objective function Jc leads to the following problem formulation consisting of a series of non-linear optimization problems: lim min Jc (x) subj. to |Ax − y|2 ≤ .
c→∞
Inserting this solution back into (7) " n # n X X ˜ λi xi min J(x, λ) = min min {1 − λi xi , 0} + i=1
n X i=1
#
Evidently the minimization over z yields 1, λi xi > 1 zi = 0, λi xi ≤ 1
x
0
x
where p = 1, i.e., the l1 -norm, is the norm with the smallest p which is convex. We formulate the problem by using a new constraint:
min J(x, z) = min
min{1, λ x}
1
The difficulty in the original problem is that the cardinality operator n X 1, xi = 6 0 card(xi ), card(xi ) = card(x) = (2) 0, xi = 0
(11)
x
We have the following derivatives of Jc , ∂Jc =h ∂xk
2cxk i2 2 cosh c |xk | h i 2 2c 1 − 4c|xk |2 tanh c |xk | ∂ 2 Jc = δkl h i2 ∂xk ∂xl 2 cosh c |xk |
(12)
(13)
where δkl is the Kronecker delta. Accordingly the gradient is well defined and the Hessian matrix has a diagonal structure. The Hessian is not positive definite, but due to the diagonal structure we can easily approximate it with a positive definite matrix to calculate a pseudo-Newton direction. For each c this non-linear optimization problem converges to a well-defined (local) maximum. To solve, we use standard
2
1
c=1 c=5 c = 25
1.5
0.9
T
probability of error
0.8
tan(c|x|2)
1
0.5
0
0.7
OR
L
0.6 0.5 0.4
R
O
0.3
tangent (T) linprog (L) OMP (O) RMP (R) rollout (OR)
0.2
−0.5
0.1 −1 −3
−2
−1
0
1
2
3
x
0 0.1
0.2
0.3 k/m
0.4
0.5
Fig. 2. tanh(c|x|2 ) is a continiously differentiable function, which converges to the cardinality operator, when c → ∞.
Fig. 3. Probability of error detecting the non-zero components of x; 103 simulation runs, n = 512, m = 64.
methods of non-linear programming [7]. We start with a feasible solution, e.g., the least-squares solution, and solve for some small c initially. When increasing c, we use the previous solution for initializing the new problem; and if we increase c slowly, the algorithm converges to a final solution. In practice, we will not solve each subproblem exactly, but only execute a few iterations, possibly just one, to save computation.
new column is selected, we subtract its component from the columns so far not selected. This way we can add one orthogonal vector to the basis at each iteration without recalculating the previous basis-vectors. 3. IMPLEMENTATION AND NUMERICAL RESULTS 3.1. Implementation
2.4. Improving Greedy Solutions with Rollout Concepts Rollout algorithms were first proposed for the approximate solution of dynamic programming recursions by Bertsekas et al. in [7]. They are a class of suboptimal solution methods inspired by the policy iteration of dynamic programming and the approximate policy iteration of neuro-dynamic programming. The rollout algorithm, combined with base heuristics (e.g., Basis Pursuit,...), can solve combinatorial optimization problems such as that here with a higher computational efficiency than the optimal strategies, while being superior to those using the base algorithms only. 2.5. Reducing Complexity of Greedy Algorithms The Order-Recursive LS MP algorithm is a variant of Matching Pursuit (MP), where the next greedy choice is based on reducing the actual fitting error. This is a sensible metric, but leads to high complexity. To reduce this complexity, we suggest an efficient implementation. In the case of orthogonal columns, maximizing the reduction in the fitting error is equivalent to choosing the maximum projection as in the usual MP algorithm. Therefore we suggest calculating orthogonal basis vectors of the space spanned by the selected columns of A, via the Gram-Schmidt procedure. When a
Both problem formulations in Section 2.2 and Section 2.3 are highly non-linear in nature. As implementation we apply efficient algorithms based on descent directions [7], namely Trust-Region and Preconditioned Conjugate Gradient methods. The differentiable approximation of the cardinality operator in Section 2.3 has considerably lower complexity, stemming from the diagonal Hessian matrix. For the following simulations we used available off-theshelf products, i.e., an implementation of the Trust-Region method available in the MATLAB Optimization Toolbox, which ran in a loop updating c at each iteration and checking convergence via n h i X 2 2 tanh c |xi | 1 − tanh c |xi | ≤ n,
(14)
i=1
which terminates once the approximation of the cardinality operator has converged for each xi . The rollout implementation used the Orthogonal MP (OMP) algorithm to find a greedy solution and then varied it by recalculating a solution while excluding one of the elements picked early by the first greedy solution. To get a noticeable improvement in performance, we found about twenty iterations to be sufficient.
2
0.5
10
tangent L1 OMP RMP OMP rollout
0.45 0.4
1
10
0
run time [s]
k/m
0.35 0.3 0.25
10
−1
10
−2
10
tangent L1 OMP RMP OMP rollout
0.2 −3
10
0.15 0.1
−4
0
50
100
150
200
250
300
m
Fig. 4. Needed sparseness; plotted is the maximum ratio k/m for which each algorithm can still find the right non-zero values of x in more than 85 % of the cases, n = 512.
10
2
3
10
10
n
Fig. 5. Run time comparison for the different algorithms for increasing n. 4. CONCLUSION
3.2. Numerical Results We will compare the non-linear optimization based on the tanh function with an l1 -norm implementation, Orthogonal Matching Pursuit (OMP), order-recursive LS MP (RMP) and the OMP with rollout. We use only noiseless observations, therefore = 0 in (1). We generate the signal x by randomly choosing k elements and assign values using a zero-mean, unit variance Gaussian distribution. The measurement matrix A has random entries from the same distribution and we normalize the columns. As a first scenario, we fix n = 512, m = 64 and slowly increase the ratio of non-zero elements to observations from 1/8 to 1/2; we observe in Fig. 4 that for small k/m, i.e., very sparse problems, basically all algorithms chose the correct non-zero elements of x. Still, different algorithms can handle less sparse problems. Especially the l1 -norm algorithm introduces many additional small elements into the solution, which can not be discerned from correct non-zero elements in the averagely sparse case, since each element of x is rather small itself. Next we increase m, while keeping n = 512; for each m we increase the ratio k/m until the probability of error exceeds 15 %. Fig. 4 confirms the trend observed before that the l1 -norm cannot handle averagely sparse problems. This is even more so when the ratio m/n approaches 1/2, as this further reduces the sparseness. Comparatively, even the different OMP versions perform better with the rollout outperforming RMP. The throughout strongest performance has the non-linear formulation using the tanh, but at the cost of high complexity, c.f. Fig. 5. Using our implementations all algorithms show a similar scaling behavior, which doesn’t seem to give l1 -norm any edge over the OMP variations.
We suggested several new approaches to the sparse estimation problem. For one, approximating the cardinality operator via the tanh function, we find a non-linear algorithm which can also solve moderately sparse problems. Second, we improve known low complexity greedy algorithms using rollout techniques, which leads to an algorithm of lower complexity than Basis Pursuit and better performance than regular OMP. 5. REFERENCES [1] S.G. Mallat and Z. Zhang, “Matching pursuits with timefrequency dictionaries,” IEEE Trans. Signal Processing, vol. 41, no. 12, pp. 3397–3415, Dec. 1993. [2] R. Gribonval and E. Bacry, “Harmonic decomposition of audio signals with matching pursuit,” IEEE Trans. Signal Processing, vol. 52, no. 1, pp. 101–111, Jan. 2003. [3] J.A. Tropp, “Greed is good: algorithmic results for sparse approximation,” IEEE Trans. Inform. Theory, vol. 50, no. 10, pp. 2231–2242, Oct. 2004. [4] D. Donoho, “Compressed sensing,” IEEE Trans. Inform. Theory, vol. 52, no. 4, pp. 1289–1306, Apr. 2006. [5] E.J. Candes, J. Romberg, and T. Tao, “Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information,” IEEE Trans. Inform. Theory, vol. 52, no. 2, pp. 489–509, Feb. 2006. [6] S.-J. Kim, K. Koh, M. Lustig, S. Boyd and D. Gorinevsky, “A method for large-scale l1 -regularized least squares problems with applications in signal processing and statistics,” 2007, Preprint, available at http://www.stanford.edu/˜boyd/reports/l1 ls.pdf. [7] D. P. Bertsekas, Nonlinear Programming, Athena Scientific, Belmont, Massachusetts, USA, 2 edition, 1995.