On the L1-Norm Approximation of a Matrix by Another

0 downloads 0 Views 178KB Size Report
i,j |ai,j|2. Then, QL2 spans the K-dimensional ... given an initial guess Q, the proposed algorithm (henceforth ... casual factors x, then at least m of these features will be ..... [1] G. H. Golub and C. F. Van Loan, Matrix Computations, 3rd ed.
2016 15th IEEE International Conference on Machine Learning and Applications

On the L1 -norm Approximation of a Matrix by Another of Lower Rank Nicholas Tsagkarakis† , Panos P. Markopoulos‡ , and Dimitris A. Pados†∗ † Department of Electrical Engineering The State University of New York at Buffalo, Buffalo, NY 14260 USA Email: {ntsagkar, pados}@buffalo.edu ‡ Department of Electrical and Microelectronic Engineering Rochester Institute of Technology, Rochester, NY 14623 USA Email: [email protected] solution, to

Abstract—In the past decade, there has been a growing documented effort to approximate a matrix by another of lower rank minimizing the L1 -norm of the residual matrix. In this paper, we first show that the problem is N P-hard. Then, we introduce a theorem on the sparsity of the residual matrix. The theorem sets the foundation for a novel algorithm that outperforms all existing counterparts in the L1 -norm error minimization metric and exhibits high outlier resistance in comparison to usual L2 norm error minimization in machine learning applications.

maximize

Q∈RD×K ; QT Q=IK

I. I NTRODUCTION AND P ROBLEM S TATEMENT As an optimization problem, Principal-Component Analysis (PCA) comes in several equivalent formulations. In its original form, PCA is the linear method of finding a basis that describes a low-dimensional subspace (line, plane, etc.) whereon data projections have the minimum sum squared Euclidean distance from the original data X ∈ RD×N (minimization of the L2 -norm of the residual error). Mathematically, this PCA formulation seeks a pair (QL2 , ZL2 ) that solves

Q∈RD×K , Z∈RK×N

QZ − X22 =

N 

Qzn − xn 22

(1)

n=1

where K < rank(X) ≤ min{D, N } and  · 2 denotes the element-wise L2 -norm of the matrix argument defined as  2 A2  i,j |ai,j | . Then, QL2 spans the K-dimensional principal subspace of the data matrix X and ZL2 contains the linear coefficients of the columns of QL2 that give the minimum possible L2 -norm residual error. This linear method is usually referred to as least squares regression. In view of the fundamental Projection Theorem [1], the minimumsquared-error subspace QL2 also maximizes the L2 -norm of the thereon projected data; i.e., (1) is equivalent, in terms of

maximize

Q∈RD×K ; QT Q=IK

QT xn 22

(2)

n=1

QT X1

(3)

L1 -norm of its matrix where  · 1 denotes the element-wise  argument defined as A1  i,j |ai,j |. The problem in (3) derives by removing the squared emphasis placed upon each datum in the standard PCA formulation in (2) and has

∗ Corresponding author. This work was supported by the National Science Foundation under Grant ECCS-1462341.

978-1-5090-6167-9/16 $31.00 © 2016 IEEE DOI 10.1109/ICMLA.2016.133

N 

where IK denotes the size-K identity matrix.1 PCA, as defined in (1) and (2), is a well behaving problem with many strengths, including computational simplicity of its solution obtained by means of singular value decomposition (SVD) [2]. However, PCA also has an important disadvantage. Standard-PCA-derived subspaces tend to be very far from the true (nominal) underlying subspace when X is corrupted by ambient errors that follow heavy-tail distributions. To recognize this, we think as follows. Each term in the summation in (1) represents the severity of the estimation/approximation error between the measured datum and its projection onto the candidate subspace. Therefore, whenever a data point lies far away from the nominal subspace (outlier), it affects estimation greatly. In fact, the loss grows quadratically with the Euclidean distance of the datum from the nominal subspace. Therefore, just one single outlying observation point may spoil the least-squares estimate. In search of alternative outlier-resistant approaches to PCA, two main frameworks have been developed: (i) Weighted PCA (WPCA) [3]–[5] and (ii) minimum sum-deviations –also known as L1 -regression, least absolute errors, or plainly L1 principal component analysis (L1 -PCA) [6]–[11]. The second framework of outlier-resistant PCA methods consists of several distinct L1 -based problem formulations. Remarkable progress on one of these formulations, known as maximum-projection-L1 -PCA, has been made recently by Markopoulos et. al. in [10]–[15] where the authors presented for the first time an optimal solution to the problem

Keywords-Low rank approximation; L1 -norm; principal component analysis (PCA); erroneous data; faulty measurements; machine learning; outlier resistance; subspace signal processing; uniform feature preservation.

minimize

QT X22 =

1 In this version of the problem, a constraint is required to regulate the objective function. Orthonormality of Q is sufficient.

768

Theorem 1. For any A ∈ RM ×m , there is at least one optimal solution xopt ∈ Rm×1 in (5) such that Axopt equals b ∈ RM ×1 in at least m entries.

been shown to offer solutions that are highly resistant to the presence of outliers in X. The earliest known, however, L1 PCA formulation, extensively studied since its introduction in 1930 [6], came as a modification of (1) by removing the squared emphasis in the individual point errors and is formally described as minimize

Q∈RD×K , Z∈RK×N

QZ − X1 .

Proof: Problem (5) can be transformed into the linear program (LP) minimize

(4)

x∈Rm×1 , t∈RM ×1

This formulation, which can be interpreted as L1 -norm approximation of data the matrix X by another of lower rank K, is the exclusive focus of this present work. Despite past strong efforts, the exact solution to (4) remains to date unknown. Arguably, two most promising approximate methods for solving (4) are presented in [7] and [8]. In [7], given an initial guess Q, the proposed algorithm (henceforth referred to as “alternating” algorithm) solves alternatingly over Z and Q until convergence. The method will converge to a ˆ Z). ˆ In [8], the authors present a method stationary pair (Q, making use of a polynomial-time algorithm for projecting D-dimensional data onto an L1 -best-fit (D − 1)-dimensional subspace. Their algorithm (henceforth referred to as “greedy” algorithm) generates a sequence of hyperplanes whose normal vectors are successive directions of minimum variation in the data in a greedy fashion. The advantage of this procedure is that each single-dimension reduction is provably optimally performed. Its disadvantage is the lack of global optimality since the optimal i-dimensional subspace is not necessarily in the optimal (i + 1)-dimensional subspace. In this paper, (i) we present a new formulation of the problem in (4) which reveals new insights, (ii) we prove formally that the problem is N P-hard, and (iii) we propose a new polynomial-time algorithm that provides a suboptimal solution which exhibits superior performance compared to the current state-of-the-art techniques.

A key idea in solving (4) is to use the well-known sparsity properties of L1 -based cost functions and expose and exploit the problem’s finiteness. Careful examination of the following simpler, but closely related, problem demonstrates the concept. The problem of L1 -norm vector approximation

x∈R

(6)

by the introduction of M auxiliary variables t  [t1 , t2 , . . . , tM ]T . Denote by (xopt , topt ) an optimal pair to (6). As every LP, (6) haves optimal solution at the boundary of its feasibility set. Therefore, at least M +m inequality constraints are active [16, Theorem 5-1]. Moreover, at optimality and for ≤ Aj,: xopt − bi and any j, it is impossible that both −topt j opt opt Aj,: x − bj ≤ tj are non-active. At least one of these constraints has to be achieved with equality because, otheropt wise, we can enforce equality by setting topt j = |Aj,: x − bj |,  opt thus decreasing the metric i ti (contradiction to topt being optimal). Putting the above two facts together, we conclude that there are m zeros in vector topt . Since t represent each term of the objective  function of (5), m of the terms in M   opt x − b A i,: i  are zero as well. i=1 Theorem 1 asserts that (at least) m of the entries of b will be represented without error by Axopt . In other words, if b is an observable set of features spawned by the sought-after casual factors x, then at least m of these features will be preserved under the L1 -norm vector approximation defined in (5). Concurrently, there will be at most M − m entries of the residual vector Axopt − b with non-zero values, the magnitude of which determines the error of the approximation. We shall index the latter entries by the index set Jopt ⊂ [1 : M ] where Jopt has cardinality M − m (|Jopt | = M − m). It is interesting to observe that knowledge of Jopt (and therefore C ) can lead us to the optimal xopt by of its complement, Jopt −1 opt C. means of x = AJ C ,: bJopt opt Surprisingly, a solution to (5) can be found by solving a system of linear equations where the left-hand-side matrix is a size-m selection of the rows of A. All the different selections of rows of A will yield different candidate vectors, one of which is guaranteed to be the optimal. Therefore, an equivalent version of (5) is

A. L1 -norm Vector Approximation Analysis

M      Ai,: x − bi ,

ti

i=1

subject to −ti ≤ Ai,: x − bi ≤ ti , ∀i ∈ [1 : M ]

II. P RELIMINARIES

Ax − b1 = minimize m×1

M 

(5)

minimize

J⊂[1:M ]; |J|=M −m

i=1

where A ∈ RM ×m and b ∈ RM ×1 with M > m, is a convex (in fact linear) problem solvable by any general purpose convex solver. The solution set is an affine set of dimension m−rank(A). Rank deficiency of A is not a concern because we can assume that all columns of A are normalized and orthogonal to each other and all orthogonal to vector b (for details see [7]). The following theorem motivates a new way to solve (5).

AA−1 b C − b1 . J C ,: J

(7)

The complexity of a brute force (exhaustive solution  search) 3 elementary m to (7) would be of cost in the order of M m operations, which is of course impractical. Nevertheless, an important step has been made. As discussed in the sequel, recasting (5) into a search over a finite set will serve toward devising a efficient algorithm to solve (4). Remark: The objective function of the problem in (7) is the L1 -norm of a vector with non-zero values only at M − m

769

 C

argmin f (Q; J1 , . . . , JN ); Jn ⊂ [1 : D], |Jn | = D − K ∀n ∈ [1 : N ]

entries indexed by J. Ignoring the zero-valued entries we can rewrite (7) as minimize

J⊂[1:M ]; |J|=M −m

AJ,: A−1 b C − bJ 1 . J C ,: J

directly from the application of Theorem 1 on each inner problem in (10). That is, [Qopt Zopt ]:,n will be equal to xn in at least K entries. Theorem 2 above shows that the optimal approximation matrix will induce no error in at least K entries per column. The question that remains unanswered is which of these entries/features are preserved. In view of Theorem 2 and in accordance to the transformation of (5) into (8), (10) can be casted into the mixed optimization problem2

(8)

B. Problem Hardness Problem (4) has remained unsolved despite strong efforts over many years [6]–[8]. This is a clear indication that the problem belongs to a class of especially hard problems. Indeed, we formally show here for the first time that the problem is N P-hard in the size pair (D, K). By introducing DN new variables, problem (4) can be rewritten as a bilinearly constrained linear program, N D  

minimize

Q∈RD×K , Z∈RK×N , T∈RD×N  −ti,j ≤ K k=1 qi,k zk,j −xi,j ≤ti,j

ti,j .

minimize

Q∈RD×K , J1 ,...,JN Jn ⊂[1:D], |Jn |=D−K

(9)

f (Q; J1 , . . . , JN ) 

i=1 j=1

The number of all possible values that tuple (J1 , . . . , JN )  D N can take is K . Since these values are finitely many, exhaustive enumeration is possible. Therefore, solving (11) for fixed (J1 , . . . , JN ) is the key to a finite-step optimal solver for (4). In Fig. 1 we present the first form of such an algorithm. The algorithm is the end-product of all the ideas discussed so far. In simple words, as proven by Theorem 2, certain entries in matrices X and Qopt Zopt (that are unknown a priori) will share the same value. The algorithm in Fig. 1 simply checks all possible combinations of those entries. In more technical terms, we construct the set C of candidate optimal Q’s (as defined in (12) at the top of this page) and perform for each Q, exhaustively, evaluation and comparison in the metric of interest. Upon completion of these comparisons, we have obtained the guaranteed optimal L1 -norm low rank approximation of matrix X. However, regretfully, constructing the candidate set C is a task that entails solving instances of (11), a problem for which, to the best of our knowledge, no exact solver (exponential or otherwise) exists. Thus, the above outlined algorithm is rendered incomplete. Despite the difficulty in solving instances of problem (11), for the general case, one can find conditions under which the problem becomes convex. In the following section, we investigate one such condition and assess the degradation of the quality of the solution caused by enforcing it.

A. Problem Reformulation In the first step of our algorithmic developments, we seek to utilize the machinery developed in Section II. To that end, we rewrite (4) as min

z∈RK×1

Qz − xn 1 .

(10)

Here, we have simply made the observation that the summation of the objective function of (4) is a separable function w.r.t. the columns of matrix Z. Therefore, for given Q, we have broken the problem minimizeZ∈RK×N QZ − X1 into N disjoint (separate) core problems. Next, we notice that the inner problems in (10) are instances of (5). By this observation, we can strengthen Theorem 1 as follows. Theorem 2. There is at least one optimal approximation matrix Qopt Zopt equal to X in at least K entries in each column. Z

opt

1

B. Outline of an Optimal Algorithm

A new polynomial-time algorithm is presented in this section for solving (4). While the proposed algorithm does not guarantee global optimality, our numerical studies indicate that it attains superior performance compared to state-of-the art techniques [7], [8].

n=1

n

At optimality, Qopt Zopt will be different from X in at most (D − K)N entries (D − K per column), indexed by opt . J1opt , . . . , JN

III. A N OVEL S UBOPTIMAL A LGORITHM

Q∈RD×K

(11)

N





x C − xJn ,n .

QJn ,: Q−1 J C ,: Jn ,n

n=1

Problem (9) is an instance of a quadratically constrained quadratic program (QCQP) with non semidefinite constraints and is, therefore, formally N P-hard [17].

N 

f (Q; J1 , . . . , JN )

where

∀(i,j)∈[1:D]×[1:N ]

minimize

(12)

Q∈RD×K

Proof: Recall that Qopt is the solution to (10) and  argminZ∈RK×N Qopt Z − X1 . The proof follows

2A

770

mixed problem is one over both discrete and continuous variables.

Algorithm 1: Optimal L1 -norm low-rank matrix approximation Input: Real data matrix XD×N and integer K < rank(X) 1: Construct set C as in (12) 2:

For all Qi ∈ C

3: 4:

2:

Zi ← argminZ Qi Z − X1

2a: 2b:

Algorithm 2: Proposed L1 -norm low-rank matrix approximation by uniform feature preservation Input: Real data matrix XD×N and integer K < rank(X) 1: i ← 0

mi ← Qi Zi − X1   opt i ← argmini∈1: D N  mi (K ) Qopt ← Qiopt

5: Zopt ← Ziopt Output: Y ← Qopt Zopt Figure 1.

i←i+1

2b: 2c:

Copt i ← argminC XJ0 ,: + CXJ0C ,: 1  opt   Qi ← N ID−K PTJ0 Ci

2d:

Zi ← argminZ Qi Z − X1 mi ← Qi Zi − X1   mi

2e:

Outline of an optimal algorithm (incomplete) for solving (4).

The proposed algorithm is a derivative of the algorithm in Fig. 1. It follows the same principle of constructing and searching exhaustively within a finite set of candidates. For every n, let Jnopt denote the size-(D − K) set of indices that index the entries where the n-th columns of X and Qopt Zopt differ. Now, we pose the following statement Jiopt = Jjopt ∀i = j.

argmin

g(C; J0 ).

iopt ← argmini∈[1:(D )]

4:

QU ← Qiopt

Figure 2.

K

The proposed algorithm for solving (4) suboptimally.

By the fact that Q is unconstrained in (11) we deduce that min f (Q; J0 , . . . , J0 ) = min g(C, J0 )

(15)

U −1 . Copt = −QU J0 ,: (QJ C ,: )

(16)

Q

(13)

C

and

If statement (13) were true for every input matrix X and integer K, then problem (4) would have been solvable in polynomial time. This contradicts the fact that the problem is N P-hard, as we showed in Section II. So, we do not expect (13) to be true in general. Nevertheless, trading optimality for low complexity, we propose enforcing the uniform feature preservation condition (13) to the algorithm in Fig. 1 to obtain the following suboptimal but practical algorithm for near-solving (4). Interestingly, our numerical computer simulations indicate that, although not all sets of indices {Jnopt }N n=1 are identical to one another, there usually exists one with exceptionally high multiplicity. The higher the multiplicity is, the more accurate the final solution obtained by the proposed algorithm. In the extreme case that the multiplicity equals to N , condition (13) holds and (4) is solved exactly (optimally). This case holds, for example, when K = D − 1. In the sequel, we present the mathematical details that allow our problem of interest (4) to be solved in polynomial time under assumption (13). The sole reason that the algorithm in Fig. 1 is not implementable is the hardness of (11). Under (13), however, problem (11) becomes convex via variable change. Define J0 to be N the common set of indices and g(C; J0 )  n=1 xJ0 ,n + CxJ0C ,n 1 = XJ0 ,: + CXJ0C ,: 1 . Then, we can solve the following linear problem

C∈R(D−K)×K

3:

5: ZU ← Ziopt Output: Y U ← QU ZU

C. Proposed Algorithm: Uniform Feature Preservation

Copt =

For all J0 ⊂ [1 : D] such that |J0 | = D − K

2a:

0

The superscript “U ” above represents the notion of solving under uniform feature preservation. Next, we can extract QU by (16) U −1 Copt = −QU J0 ,: (QJ C ,: )





0

C

opt

ID−K



QU J0C ,: QU J0 ,:

 =0

ID−K PTJ0 QU = 0   Copt ID−K PTJ0 ⇒ QU = N ⇒



Copt

where PJ0 is the permutation matrix P J0 =



eJ0C (1) . . . eJ0C (K)

eJ0 (1) . . . eJ0 (D−K)



, (17)

ei is the ith column of ID , and N (A) denotes any matrix in RD×K that constitutes a basis for the nullspace of a fullrank A ∈ RD×(D−K) . Finally, having calculated QU , the proposed algorithm returns the pair (QU , ZU ), where ZU = argminZ QU Z − X1 (convex problem), as an approximate solution to (4). We conclude that the optimal solution of (11) under constraint (13) can be obtained by (i) solving a linear program and (ii) computing a basis for the nullspace of a matrix. Also, we note that, for the special case where K = D − 1 the proposed algorithm simplifies to the algorithm in [8] and calculates the optimal subspace. The proposed algorithm is summarized in pseudocode in Fig. 2.

(14)

771

IV. N UMERICAL E XPERIMENTS

into

We test in two separate experiments the performance of the proposed algorithm against the current state-of-the-art alternating [7] and greedy [8] algorithms. The first experiment measures the objective L1 -error metric itself. The second experiment applies the algorithms in the structure-from-motion (SFM) problem discussed by Kanade and Tomasi in [18].

X2F ×M

⎤ ⎥ ⎥ S = P2F ×3 · S3×M . ⎦ 

P

Notice that the matrix P is the vertical concatenation of F 2by-3 unitary matrices. The above matrix X is a product of two rank-3 matrices and, therefore, is of rank 3. Next, the matrix X is factorized to obtain P and S subject to the structure of P. Fortunately, the structure of matrix P does not need to be taken into account during factorization because, for every possible factorization (P0 , S0 ) there is always another with the desired structure of the form (P0 V, V−1 S0 ) for some invertible matrix V3×3 and the same error value. In our numerical experiment, we randomly generate points S and camera planes P, form a distorted version of PS, and recover the points S. We perform this experiment for several numbers of tracking points M over F = 5. The average is taken oven 1000 trials. The data matrix X is then considered to be heavily occluded by sporadic corruption as

A. L1 -norm error In this experiment, we study the performance of the proposed algorithm in approximating one matrix by another of lower rank with as low L1 -error as possible. We start by generating randomly under the Gaussian distribution N (0, 1) a D = 20 by N = 60 matrix of rank 20. Then, we apply all three algorithms ([7], [8], and proposed in Fig. 2) for every K < D. The achieved approximation are stored and averaged over 400 independent trials. In Fig. 3 we plot versus K the average relative error increase by the two other methods against the proposed. The proposed algorithm outperforms [7] and [8] for all values of K. The algorithm can offer up to about 30% improvement against the alternating method [7] for K = 17. For K = D − 1 = 19, the proposed algorithm and the greedy [8] are both optimal and have, of course, the exactly same performance.

X = PS + 0.005N + 5Γ N , where denotes Hadamard multiplication between matrices. The entries of N and N are independent following Gaussian distribution, and Γ is a random {0, 1}-valued matrix indicating the location of corruption. In our experiments, we set 93.33% of the Γ’s entries to 0. Fig. 4 shows the degree of success at recovering S for several different numbers of tracking points ranging from M = 50 to 320. The vertical axis describes topological misfit ˆ is the points’ 3D coordinates ˆ 2 where S defined as S − S as calculated by the four methods in comparison (alternating [7], greedy [8], proposed, and SVD [2]). Expectedly, due to the impulsive nature of distortion, conventional L2 -norm PCA exhibits poor performance (more than an order of magnitude higher error), compared to the three L1 -norm minimizers. In addition, we notice that the proposed algorithm outperforms clearly the algorithms of [7] and [8] in recovering the geometry of the scene. The performance improvement attained by the proposed algorithm is profound for low number of tracking points.

B. Structure from motion In this section, we investigate the behavior of the proposed algorithm in an application known as structure-from-motion (SFM). This application, which falls into the category of video streaming processing, has the objective to recover scene geometry and camera motion from a sequence of images. Our assumptions are that (i) the distance of the camera from the scene does not change significantly and can be approximated as constant and (ii) the frames are orthographic projections. The geometry of a scene can be inferred by the geometry of a few tracking points previously extracted from the sequence. We denote the number of tracking points by M and the number of frames by F . To determine the geometry of the points is to identify their coordinates in the 3D space described by a matrix S of size 3-by-M . Since these points are tracked as the video progresses, we know their positions on each frame; i.e., we have F matrices of size 2-by-M that contains the 2D coordinates of every point when projected on the cameraplane. These matrices, under orthography, are given by P(f ) S ∀f ∈ [1 : F ]

⎤ ⎡ (1) P P(1) S ⎥ ⎢ .. ⎢ .. ⎥=⎢ . ⎢ . ⎦ ⎣ ⎣ (F ) P(F ) P S   ⎡

V. C ONCLUSION In this work, we focused on the fundamental problem of approximating a given matrix by another of lower rank by minimizing the L1 -norm of the residual error. We presented a new theorem on the sparsity of the residual error matrix. We proved formally the N P-hardness of the problem and presented a novel algorithm for approximate solution that outperforms commonly used state-of-the-art schemes. The outlier resistance of the proposed algorithm has been documented by experimental studies on motion trucking application.

(18)

where P(f ) is a 2-by-3 unitary matrix that spans the f -th camera-plane. Again, the goal is to both estimate the matrix S and identify all camera planes {P(f ) }F f =1 , jointly. To do this, we first concatenate vertically all observation matrices {P(f ) S}F f =1

772

[5] C. Croux and P. Filzmoser, “Robust factorization of a data matrix”, in Proc. Computational Statistics (COMPSTAT), Bristol, Great Britain, 1998, pp. 245-249.

35 Alternating [7] Greedy [8]

Relative error increase (%)

30

[6] E. C. Rhodes, “Reducing observations by the method of minimum deviations,” Phil. Mag. (7th Series), vol. 10, pp. 511-512, 1930.

25 20

[7] Q. Ke and T. Kanade, “Robust L1 norm factorization in the presence of outliers and missing data by alternative convex programming,” in Proc. IEEE Conf. Computational Vision Patt. Recog. (CVPR), San Diego, CA, Jun. 2005, pp. 739-746.

15 10

[8] J. P. Brooks, J. H. Dul´a, and E. L. Boone, “A pure L1 -norm principal component analysis,” J. Comput. Stat. Data Anal., vol. 61, pp. 83-98, May 2013.

5 0 0

5

10 K

15

20

[9] C. Ding, D. Zhou, X. He, and H. Zha, “R1 -PCA: Rotational invariant L1 -norm principal component analysis for robust subspace factorization,” in Proc. 23rd Int. Conf. on Mach. Lear. (ICML), 2006, pp. 281-288.

Figure 3. Relative error increase against proposed algorithm versus rank K (D = 20, N = 60).

[10] P. P. Markopoulos, G. N. Karystinos, and D. A. Pados, “Optimal algorithms for L1 -subspace signal processing,” IEEE Trans. Signal Process., vol. 62, no. 19, pp. 5046-5058, Oct. 2014.

4

Average topological misfit

10

SVD [2] Alternating [7] Greedy [8] Proposed

[11] P. P. Markopoulos, G. N. Karystinos, and D. A. Pados, “Some options for L1 -subspace signal processing,” in Proc. 10th Int. Symp. on Wireless Commun. Syst. (ISWCS), Ilmenau, Germany, Aug. 2013, pp. 622-626.

3

10

[12] N. Tsagkarakis, P. P. Markopoulos, and D. A. Pados, “Direction finding by complex L1 -principal component analysis,” in Proc. IEEE 16th Int. Workshop Signal Process. Adv. Wireless Commun. (SPAWC), Stockholm, Sweden, Jun. 2015, pp. 475–479.

2

10

[13] P. P. Markopoulos, N. Tsagkarakis, D. A. Pados, and G. N. Karystinos, “Direction finding with L1 -norm subspaces,” in Proc. Comp. Sens. Conf., SPIE Def., Security, Sens. (DSS), Baltimore, MD, May 2014, pp. 91090J-1–91090J-11.

1

10

50

100

150 200 250 Number of tracking points

300

350

[14] S. Kundu, P. P. Markopoulos, and D. A. Pados, “Fast computation of the L1 -principal component of realvalued data,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP), Florence, Italy, May 2014, pp. 8028–8032.

Figure 4. Topological misfit versus number of points under impulsive noise occlusions.

[15] P. P. Markopoulos, “Reduced-rank filtering on l1-norm subspaces,” in Proc. IEEE Sens. Array Multichan. Signal Process. Workshop (SAM), Rio de Janeiro, Brazil, Jul. 2016.

R EFERENCES [1] G. H. Golub and C. F. Van Loan, Matrix Computations, 3rd ed. Baltimore, MD: The Johns Hopkins Univ. Press, 1996.

[16] D. A. Pierre, Optimization theory with applications, New York, NY: John Wiley & Sons, 1969.

[2] C. Eckart and G. Young, “The approximation of one matrix by another of lower rank,” Phychometrika, vol. 1, no. 3, pp. 211-218, Sep. 1936.

[17] P. M. Pardalos and S. A. Vavasis, “Quadratic programming with one negative eigenvalue is NP-Hard,” J. Global Optim., vol. 1, no. 1, pp. 15-22, Mar. 1991.

[3] M. Black and A. Rangarajan, “On the unification of line process, outlier rejection, and robust statistics with applications in early vision,” Int. J. of Comput. Vision, vol. 19, no. 1, pp. 57-91, Jul. 1996.

[18] C. Tomasi and T. Kanade, “Shape and motion from image streams under orthography: A factorization method.” Int. J. of Comput. Vision, vol. 9, no. 2, pp. 137-154, Nov. 1992.

[4] N. Srebro and T. Jaakkola, “Weighted low-rank approximations,” in Proc. 20th Int. Conf. on Machine Learning (ICML), Washington DC, MD, 2003, pp. 720-727.

773

Suggest Documents