Document not found! Please try again

A Note on Complexity of Lp Minimization

2 downloads 0 Views 151KB Size Report
Feb 9, 2010 - problem is NP-Hard; but computing a local minimizer of the problem is polynomial- ... The Lp (0 ≤ p < 1) minimization problem (1) is NP-hard.
A Note on Complexity of Lp Minimization Dongdong Ge



Xiaoye Jiang†

Yinyu Ye‡

February 9, 2010

Abstract We show that the Lp (0 ≤ p < 1) minimization problem arising from sparse solution construction and compressed sensing is both hard and easy. More precisely, for any fixed 0 < p < 1, we prove that checking the global minimal value of the problem is NP-Hard; but computing a local minimizer of the problem is polynomialtime doable. We also develop an interior-point algorithm with a provable complexity bound and demonstrate preliminary computational results of effectiveness of the algorithm.

1

Short Introduction

In this note, we consider the following optimization problem: X p Minimize p(x) := xj 1≤j≤n

Subject to

Ax = b, x ≥ 0,

and

X

Minimize

|xj |p

1≤j≤n

Subject to

(1)

(2)

Ax = b;

where data A ∈ Rm×n , b ∈ Rm , and 0 < p < 1. Sparse signal or solution reconstruction by solving optimization problem (1) or (2), especially for the cases of 0 ≤ p ≤ 1, recently received great attentions. In signal reconstruction, one typically has linear measurements b = Ax∗ where x∗ is a sparse signal, and the sparse signal would be recovered by solving inverse problem (1) or (2) with p = 0, that is, to find the sparsest or smallest support cardinality solution of a linear system ∗

Department of Management Science, Antai School of Economics and Management, Shanghai Jiao Tong University, Shanghai, China 200052. E–mail: [email protected] † Institute for Computational and Mathematical Engineering, Stanford University, Stanford, CA 94305. E–mail: [email protected] ‡ Department of Management Science and Engineering, Stanford University, Stanford, CA 94305. E– mail: [email protected]; research supported in part by NSF GOALI 0800151.

1

(here |x|0 = 1 if x 6= 0 and 0 otherwise). From the computational complexity point of view, when p = 0, problem (1) or (2) is shown to be NP-hard [8] to solve; when p = 1, both problems are linear programs, hence they are polynomial-time solvable. In [2, 6], it was shown that if certain restricted isometry property holds for A, then the solutions of (2) for p = 0 and p = 1 are identical. Hence, problem (2) with p = 0 can be relaxed to problem (2) with p = 1. However, restricted isometry property may be too strong for practical basis design matrices A to hold. Instead, one may consider sparse recovery by solving relaxation problem (1) or (2) for a fixed p, 0 < p < 1. Recently, this approach has attracted a lot of research efforts in variable selection and sparse reconstruction, e.g., [7]. It exhibits desired threshold bounds on any non-zero entry of a computed solution [5], and computational experiences show that by replacing p = 1 with a p < 1, reconstruction can be done equally fast with many fewer measurements while being more robust to noise and signal non-sparsity [4]. In this note, we show that the Lp (0 ≤ p < 1) minimization problem is both hard and easy. More precisely, the Lp (0 ≤ p < 1) minimization problem (1) or (2) is NPHard. On the other hand, any basic (feasible) solution of (1) or (2) is a local minimizer, so that computing a local minimizer of the problem is polynomial-time doable. We also present a potential reduction algorithm that is an FPTAS to compute an approximate local minimizer. Numerical experiments are conducted to test its efficiency.

2

The Hardness

Theorem 1. The Lp (0 ≤ p < 1) minimization problem (1) is NP-hard. Proof. We present a polynomial time reduction from the well known NP-complete partition problem. The partition problem can be described as follows: given a set S of integers or rational numbers {a1 , a2 , . . . , an }, is there a way to partition S into two disjoint subsets S1 and S2 such that the sum of the numbers in S1 equals the sum of the numbers in S2 ? We describe a reduction from an instance of the partition problem to an instance of the Lp (0 ≤ p < 1) minimization problem (1) that has the optimal value n if and only if the former has an equitable bipartition. Given an instance of the partition problem, let vector a = (a1 , a2 , . . . , an ) ∈ Rn . Consider the following minimization problem in form (1): X p Minimize P (x, y) = (xj + yjp ) 1≤j≤n

Subject to

aT (x − y) = 0, xj + yj = 1, ∀j, x, y ≥ 0.

(3)

From the strict concavity of the objective function, xpj + yjp ≥ xj + yj (= 1), ∀j, and the equality holds if and only if (xj = 1, yj = 0) or (xj = 0, yj = 1). Thus, P (x, y) ≥ n for any (continuous) feasible solution of (3). If there is a feasible (optimal) solution pair (x, y) such that P (x, y) = n, it must be true that xpj + yjp = 1 = xj + yj for all j so that (x, y) must be a binary solution, 2

(xj = 1, yj = 0) or (xj = 0, yj = 1), which generates an equitable bipartition of the entries of a. On the other hand, if the entries of a have an equitable bipartition, then (3) must have a binary solution pair (x, y) such that P (x, y) = n. Thus we prove the NP-hardness of problem (1). For the same instance of the partition problem, we consider the following minimization problem in form (2): X Minimize (|xj |p + |yj |p ) Subject to

1≤j≤n T

a (x − y) = 0, xj + yj = 1, ∀j.

(4)

Note that this problem has no non-negativity constraints on variables (x, y). However, for any feasible solution (x, y) of the problem, we still have |xj |p + |yj |p ≥ xj + yj (= 1), ∀j. This is because the minimal value of |xj |p + |yj |p is 1 if xj + yj = 1, and the quality holds if and only if (xj = 1, yj = 0) or (xj = 0, yj = 1). Therefore, similarly we can prove that this instance of the partition problem has an equitable bipartition if and only if the objective value of (4) is n. This leads to: Theorem 2. The Lp (0 ≤ p < 1) minimization problem (2) is NP-hard. Note that the L1 minimization of the reduced problem does not reveal much sparsity information of the solution set, since any feasible solution is a (global) minimizer.

3

The Easiness

The above discussion reveals that finding a global minimizer for the Lp norm optimization problem is NP-hard as long as p < 1. Thus, relaxing p = 0 to some p < 1 gains no advantage in terms of the (worst-case) computational complexity. We now turn our attention to local minimizers. Note that, for many optimization problems, finding a local minimizer, or checking if a solution is a local minimizer, remains NP-hard. What about local minimizers of problems (1) and (2)? The answer is that they are easy to find. Theorem 3. The set of all basic feasible solutions of (1) is exactly the set of all of its local minimizers. Proof. Observe that the objective function of (1) is strictly concave and its feasible region is a convex polyhedral set. If x is a basic feasible solution (or extreme point), then consider its ²(> 0) neighborhood in the feasible region. Note that any other feasible solution in the neighborhood must have one entry that has a positive value less than ² and is zero at x. However, the derivative of ²p can be arbitrarily large if ² is sufficiently small enough. This implies that the value of the objective must be increased no matter which feasible direction one follows when it starts from a basic feasible solution. Thus, x must be a strict local minimizer.

3

On the other hand, let x be a local minimizer but not a basic feasible solution (extreme point). Then, x must be in the interior of a face of the convex polyhedral set. For any of its neighborhood, there is a feasible direction d 6= 0 for sufficiently small ² > 0 such that both x + ²d and x − ²d are feasible for sufficiently small but positive ² and either d or −d will be a descent direction of the strict concave objective function. Thus, x cannot be a local minimizer. Similarly, we can prove Theorem 4. The set of all basic solutions of (2) is exactly the set of all of its local minimizers. The strict concavity implies that the Hessian of the matrix cannot be positive semidefinite in any subspace unless it is in a single point set (basic feasible solution). Thus one can observe the following property of a local minimizer. Theorem 5. The first and second order necessary conditions are sufficient for a feasible solution to be a local minimizer for the Lp minimization problem (1). Proof. Due to the strict concavity of the objective function, there is no other point which satisfies the first order condition except at most one possible maximal point in the interior of the feasible region. This follows from the fact that the derivative is zero for every direction for a strict interior point if the first order condition is satisfied. For points on the boundary, consider an arbitrary lower-dimension boundary facet that is not a vertex. By reducing dimensions at which the coordinates of x are zeroes, we can still prove there is no (relative) interior point on the boundary that the first order condition holds (except a possible relative maximal point in this facet at which the second order condition does not hold). So there does not exist a point that both satisfies the first and second order necessary conditions unless it is in a single point facet which is a basic feasible solution. Thus the property follows directly from Theorem 3.

4

Interior-Point Algorithm

These local minimizer results show that there is little hope to solve (1) starting from a basic feasible solution. Naturally, one would start from an interior-point feasible solution such as the analytic center x0 of the feasible polytope (if it is bounded and has an interior feasible point). Similar to potential reduction algorithms for linear programming, one could consider the potential function φ(x) = ρ log(

n X j=1

xpj − z) −

n X

log xj = ρ log(p(x) − z) −

j=1

n X

log xj ,

(5)

j=1

where z is a lower bound on the global minimal objective value of (1) and parameter ρ > n. For simplicity, we set z = 0 throughout the discussion. Note that à n !1/n Pn p Y p j=1 xj ≥ xj . n j=1 4

So

n

X n n log(p(x)) − log xj ≥ log n. p p j=1

Thus, if φ(x) ≤ (ρ − n/p) log(²) + (n/p) log n, we must have p(x) ≤ ², which implies that x must be an ²-global minimizer. In a manner similar to the potential reduction algorithm discussed in [10] for nonconvex quadratic minimization, one can consider one-iteration update from x to x+ . Let dx , Adx = 0, be a vector such that x+ = x+dx > 0. Then, from the concavity of log(p(x)), we have 1 log(p(x+ )) − log(p(x)) ≤ ∇p(x)T dx . p(x) On the other hand, if kX −1 dx k ≤ β < 1, where X = Diag(x), we have n X

log(x+ j )



j=1

n X

log(xj ) ≤ −eT X −1 dx +

j=1

β2 2(1 − β)

where e is the vector of all ones. Let d0 = X −1 dx . Then, to achieve a potential function, one can minimize an affinescaled linear function subject to a ball constraint as it is done for linear programming: ³ ´ ρ 0 T T Z(d ) := Minimize ∇p(x) X − e d0 p(x) (6) Subject to AXd0 = 0 kd0 k2 ≤ β 2 . This is simply a linear projection problem. The minimal value is z(d0 ) = −β · kg(x)k and β g(x) where the optimal direction is given by d0 = kg(x)k g(x) = −(I − XAT (AX 2 AT )−1 AX)( = e−

ρ X(∇p(x) − AT y). p(x)

ρ X∇p(x) − e) p(x)

If kg(x)k ≥ 1, then the minimal objective value of the subproblem is less than −β so that β2 φ(x+ ) − φ(x) < −β + 2(1 − β) Thus, the potential value is reduced by a constant for setting β = 1/2. If this case would hold for O((ρ − n/p) log 1² ) iterations, we would have produced an ²-global minimizer of (1). ρ X(∇p(x) − AT y), we must have On the other hand, if kg(x)k ≤ 1, from g(x) = e − p(x) ρ X(∇p(x) − AT y) ≥ 0; p(x) In other words, µ

1 ∇p(x) − AT y 0 p(x)

ρ X(∇p(x) − AT y) ≤ 2e, ∀j p(x)



µ ≥ 0;

xj 5

1 ∇p(x) − AT y 0 p(x)

¶ j

2 ≤ , ∀j ρ

1 where p(x) ∇p(x) can be viewed as a normalized gradient vector of the original Lp mini1 mization, and y 0 = p(x) y, The first condition indicates that the Lagrange multiplier y 0 is feasible. For the second 1 we have k p(x) ∇p(x) − AT y 0 k1 ≤ ², which implies that x is inequality, by choosing ρ ≥ 2n ² an ²-complementarity solution. Concluding the analysis above, we have the following lemma.

Lemma 6. The interior point algorithm returns an ²-stationary or complementarity point of (1) in no more than O( n² log 1² ) iterations. A more careful computation will make the complementarity point satisfy the second order optimality condition; see [10]. By combining these observations with Theorem 5, we immediately conclude the convergence of interior-point algorithms. Corollary 7. Interior-point algorithms, starting from the center of the polytope (the maximizer of the problem), generate a sequence of interior points converging to a local minimizer and compute an approximate local minimizer in FPTAS time for the Lp minimization problem. Thus, interior-point algorithms, including the simple affine-scaling algorithm that always goes along a descent direction, can be effective (with a FPTAS time) in solving the Lp minimization problem as well.

5

Computational Experiment of the Interior-Point Algorithm

In this section, we compute a solution of (2) using the interior-point algorithm above and compare it with the solution of L1 problem, i.e., the solution of (2) with p = 1, which is also computed by an interior-point algorithm for linear programming [9]. Our preliminary results reinforce our reasoning that interior-point algorithms likely avoid some local minimizers on the boundary and recover the sparse solution. We construct 1000 random pairs (A, x) with matrices A size of 30 by 120 and vectors x ∈ R120 for sparsity kxk0 = s with s = 1, 2, · · · , 20. With basis matrix given, vector b = Ax is known. We use several basis design matrices A to test our algorithms. In particular, set A = [M, −M ] and M is one of the following matrices: (1) Sparse binary random matrices where there are only a small number of ones in each column; (2) Gaussian random matrices whose entries are i.i.d. Gaussian random variables; (3) Bernoulli random matrices whose entries are i.i.d. Bernouli random variables. We note that Gaussian or Bernoulli random matrices satisfy the restricted isometry property [2, 3], i.e., columns of the basis matrix are nearly orthogonal, while sparse binary random matrix only satisfies a weaker form of restricted isometry property [1]. Because two copies of the same random matrices are concatenated in A, column orthogonality will be hard to hold, which will lead to easy failure of sparse recovery by L1 problem. We solve (2) with p = 1/2 by the interior-point algorithm developed above and compare the successful sparse recovery rate with the solutions of the L1 problem. A solution is considered successfully recovering x if the l2 distance between the solution and x is less than 10−3 . 6

Binary Random Matrices

Gaussian Random Matrices

1

Bernoulli Random Matrices

1

1 L1

0.9

0.9

L

Frequency of Success

Frequency of Success

0.7 0.6 0.5 0.4

0.8

0.8

0.7

0.7

Frequency of Success

Lp

0.8

L1 0.9

Lp

1

0.6 0.5 0.4

0.6 0.5 0.4

0.3

0.3

0.3

0.2

0.2

0.2

0.1

0.1

0

0

2

4

6

8

10 Sparsity

12

(a)

14

16

18

20

0

Lp

0.1

0

2

4

6

8

10 Sparsity

(b)

12

14

16

18

20

0

0

2

4

6

8

10 Sparsity

12

14

16

18

20

(c)

Figure 1: Successful sparse recovery rates of Lp and L1 solutions, with matrix A constructed from (a) binary matrix; (b) Gaussian random matrix; (c) Bernoulli random matrix.

The phase transitions of successful sparse recovery rates for Lp and L1 problem for three cases of basis matrices are plotted in figure 5. We observe that the L0.5 interiorpoint algorithm performs better in successfully identifying the sparse solution x than the L1 algorithm does. Moreover, we note that when basis matrix are binary, we have comparatively lower rates of successful sparse recovery (when sparsity s = 4, for example, successful recovery rate for Lp solutions is about 95% for binary random matrices compared with almost 100% for Gaussian or Bernoulli random matrices; for L1 solutions it is about 80% for binary random matrices compared with about 90% for Gaussian or Bernoulli random matrices). This may be supported by the fact that sparse binary random matrix has even worse column orthogonality than Gaussian or Bernoulli cases. Our simulation results also show that the interior-point algorithm for the Lp problem with 0 < p < 1 runs as fast as the interior-point method for L1 problem [9], which makes the interior-point method competitive for large scale sparse recovery problems.

References [1] Radu Berinde, Anna C. Gilbert, Piotr Indyk, Howard J. Karloff and Martin J. Strauss “Combining geometry and combinatorics: A unified approach to sparse signal recovery”, preprint, 2008. [2] Emmanuel. J. Cand`es and Terence Tao, “Decoding by linear programming”, IEEE Transaction of Information Theory, 51(2005), 4203-4215. [3] Emmanuel. J. Cand`es and Terence Tao, “Near optimal signal recovery from random projections: Universal encoding strategies,” IEEE Trans. Info. Theory, 52(2006), 54065425. [4] Rick Chartrand “Fast algorithms for nonconvex compressive sensing: MRI reconstruction from very few data”, in IEEE International Symposium on Biomedical Imaging (ISBI), 2009.

7

[5] Xiaojun Chen, Fengmin Xu and Yinyu Ye, “Lower Bound Theory of Nonzero Entries in Solutions of `2 -`p Minimization”, Technical Report, The Hong Kong Polytechnic University, 2009. [6] David Donoho, “For most large underdetermined systems of linear equations the minimal l1 -norm solution is also the sparsest Solution”, Technical Report, Stanford University, 2004. [7] Jianqing Fan and Runze Li, “Variable selection via nonconcave penalized likelihood and its oracle properties”, Journal of American Statistical Society, 96(2001), 13481360. [8] Balas K. Natarajan, “Sparse approximate solutions to linear systems”, SIAM Journal on Computing, 24(1995), 227-234. √ [9] Yinyu Ye, Michael J. Todd, and Shinji Mizuno, “An O( nL)-Iteration Homogeneous and Self-dual Linear Programming Algorithm”, Mathematics of Operations Research 19(1994) 53-67. [10] Yinyu Ye, “On the complexity of approximating a KKT point of quadratic programming”, Mathematical Programming, 80(1998), 195-211.

8

Suggest Documents