This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE PROJECTIVE MULTI-VIEW STRUCTURE AND MOTION FROM ELEMENT-WISE FACTORIZATION
1
Projective Multi-view Structure and Motion from Element-wise Factorization Yuchao Dai, Student Member, IEEE, Hongdong Li, Member, IEEE, and Mingyi He, Member, IEEE Abstract—The Sturm-Triggs type iteration is a classic approach for solving the projective structure-from-motion (SfM) factorization problem, which iteratively solves the projective depths, scene structure and camera motions in an alternated fashion. Like many other iterative algorithms, the Sturm-Triggs iteration suffers from common drawbacks such as requiring a good initialization, the iteration may not converge or only converge to a local minimum, etc. In this paper, we formulate the projective SfM problem as a novel and original element-wise factorization (i.e., Hadamard factorization) problem, as opposed to the conventional matrix factorization. Thanks to this formulation, we are able to solve the projective depths, structure and camera motions simultaneously by convex optimization. To address the scalability issue, we adopt a continuation based algorithm. Our method is a global method, in the sense that it is guaranteed to obtain a globally-optimal solution up to relaxation gap. Another advantage is that, our method can handle challenging real-world situations such as missing data and outliers quite easily, and all in a natural and unified manner. Extensive experiments on both synthetic and real images show comparable results compared with the state-of-the-art methods. Index Terms—Element-wise factorization, projective structure and motion, semi-definite programming, missing data, outlier.
!
1
I NTRODUCTION
S
TRUCTURE and motion recovery from multi-view image feature correspondences is a fundamental problem in computer vision. The Tomasi-Kanade’s factorization framework [1] is one of the most influential methods in multi-view structure-from-motion (SfM). This approach is not only theoretically important, which reveals the deep and elegant mathematical structure of the SfM problem, but also practically useful, thanks to its conceptual and computational simplicity. Given a multiview measurement matrix M, the factorization approach simultaneously solves the (stacked) camera projection matrix P (the camera motions) and the 3D structure X, via a simple matrix factorization M = PX through a single Singular Value Decomposition (SVD). Its nice property also comes from the fact that it treats all points and all camera frames uniformly, no point or frame is deemed more “privileged” or “preferred” than the others. Originally, the Tomasi-Kanade’s factorization framework was developed for the orthographic camera with complete feature correspondences. In this paper, we revisit the projective extension of the factorization method in order to deal with perspective camera model. In particular, we are motivated by the most popular algorithmof-choice for projective factorization–the iterative Sturm• Yuchao Dai is with the School of Electronics and Information, Northwestern Polytechnical University, China, and with Research School of Computer Science, Australian National University, Australia. Email:
[email protected]. • Hongdong Li is with ANU/NICTA. E-mail:
[email protected]. • Mingyi He is with School of Electronics and Information, Northwestern Polytechnical University and ShaanXi Key Laboratory of Information Acquisition and Processing, Xi’an, China. Email:
[email protected].
Digital Object Indentifier 10.1109/TPAMI.2013.20
Triggs method1 [2], [3]. The basic idea of the SturmTriggs iteration is simple: compared with the affine camera case, now the basic projective imaging process can be compactly written as ΛM = PX, where Λ is a properly stacked but unknown projective depth matrix, denotes the element-wise (Hadamard) multiplication. Since the depth matrix Λ, the right-hand factorization P and X are unknowns, a natural approach to solve this problem is through iterative alternation till converge. Often, the initial point to start with is to assign all depths to be 1. In other words, this is equivalent to starting from affine camera model. Many other projective generalizations, such as [4], [5], [6], [7], share a similar computational pattern in terms of alternation. Like any other iterative algorithms, the Sturm-Triggs iteration and its extensions have some common drawbacks. For example, the iterative algorithms all require a good initial point to start with; the iteration procedure may not converge; even if it does converge (theoretically or empirically), it may only converge to a local minimum; global optimality can hardly be guaranteed. However, for the particular method of the SturmTriggs iteration and the alikes, all the aforementioned drawbacks did have been observed in all occasions. Indeed, Mahamud and Hebert [8] pointed out that the iterative Sturm-Triggs method with column-wise and row-wise normalization is not guaranteed to converge in theory. To salvage this problem, they proposed a column-wise only normalization and derived a provably-convergent iterative algorithm (called columnspace method) [8]. Oliensis and Hartley also observed situations where the iterations fell into a limiting cycle 1. Note that the original version of the Sturm-Triggs method reported in [2] is non-iterative, which crucially relies on accurate depth estimation from pair-wise epipolar geometry.
0162-8828/13/$31.00 © 2013 IEEE
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE PROJECTIVE MULTI-VIEW STRUCTURE AND MOTION FROM ELEMENT-WISE FACTORIZATION
and never converged [9]. Hartley and Zisserman [10] discussed when the popular choice of initialization– assuming all depths to be one–is a sensible choice for the iteration. They concluded that this assumption works only when the ratios of true depths of the different 3D point xj remain approximately constant during a sequence. Consequently, to make the Sturm-Triggs iteration work, the ground truth solution has to be rather close to the affine case. Even worse, a recent complete theoretical analysis delivers even more negative messages [9]. Oliensis and Hartley showed that (1) the simplest iterative extension of the Sturm-Triggs algorithm (SIESTA) without normalization/balancing will always converge to the trivial solution; (2) paper [8]’s provably-convergent iteration method will generally converge to a useless solution; (3) applying both column-wise and row-wise normalization may possibly run into unstable state during iteration. They also provided a remedy, i.e. a regularization-based iterative algorithm (called CIESTA) that can converge to a stable solution, albeit the solution is biased–biased towards all estimated depths being close to one. When the actual configurations for cameras and points are far away from affine configuration, the Sturm-Triggs algorithm and its extensions tend to fail to converge to good solution. Yet, it would be nicer if a projective factorization algorithm can be made free from these theoretical drawbacks and be useful in practice. Meanwhile, in most real-world SfM applications, due to self occlusions, points behind cameras (i.e., cheirality) or limitations of the feature trackers, missing data are inevitable. However the Tomasi-Kanade’s factorization framework and the Sturm-Triggs algorithm could not be extended to deal with missing data naturally. This constitutes a major drawback of factorization-based methods. Substantially, the unknown projective depths will make the problem much harder. Furthermore, outliers are unavoidable in real image projective reconstruction due to mismatches/mistracks (with the tracker unaware), causing measurement matrix to have some entries with gross errors (a corrupted measurement matrix). Even after some robust estimators such as RANSAC (Random Sample Consensus [11]) have been applied, there are still small proportion of outliers. Therefore a practical factorization framework should be able to handle complete measurements, incomplete measurements, and outliers elegantly. Our new method, to be presented next, is one of this kind. Part of the work was published in ECCV’10 [12]. However, this journal version contains nontrivial and substantial extensions, in both theory and practical solution techniques.
2
reconstruction method, another family of reconstruction methods is incremental or sequential reconstruction methods. Theses methods initialize from two or three views and add points and cameras via triangulation and camera resection. After adding each view, bundle adjustment [13] is used to refine the reconstruction. These methods have successfully built 3D models from large scale unordered Internet community image collections [14] [15] [16]. However, these incremental methods tend to be computationally intensive, making repeated use of bundle adjustment [13] as well as outlier rejection to remove inconsistent measurements. Additionally, these methods do not treat all images equally as in factorization framework, producing different results depending on the order of the photos being considered. Recently, Crandall et al. [17] showed that robust global objective functions can be used in SfM and optimized using discrete belief propagation techniques. Olsson and Enqvist [18] presented a non-incremental approach to SfM based on robustly computing global rotations from relative geometries and feeding these into the knownrotation framework to initialize bundle adjustment. 1.2
Outline of Our Approach
In this paper, we propose a unified framework for projective factorization, which stays away from all the above mentioned theoretical traps. Our method is global and no initial guess is needed. Our key idea is that: mathematically, instead of formulating the problem as matrix factorization, we reformulate the problem as element-wise factorization that solves the projective depth matrix Λ, projection matrix P and structure X simultaneously. This simple change offers many benefits in analyzing and solving the problem. Our method can handle complete measurements, incomplete measurements, outliers and degenerate case all in a natural and unified manner. The numerical implementation of our approach is efficient, as we formulate the problem as semi-definite programming (SDP) for which powerful off-the-shelf solvers are available. To address the scalability issue, we adopt an alternating direction continuation based algorithm, leading to much more efficient computations. The proposed algorithm is more suitable for real world large-scale SfM applications.
2
E LEMENT-W ISE FACTORIZATION
In this section, we present our element-wise factorization framework for projective multi-view structure and motion under complete measurement case. We analyze various balancing strategies. Additionally, we give analysis on the existence and uniqueness of our formulation.
1.1 Related Work in Scene Reconstruction Pipeline The factorization based structure and motion recovery method belongs to the family of batch reconstruction, where the goal is to reconstruct a scene in one step rather than in an incremental way. Besides the batch
2.1
Problem Statement
The problem of projective multi-view structure and motion seeks to simultaneously solve the unknown projective depths, the unknown camera motions and the
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE PROJECTIVE MULTI-VIEW STRUCTURE AND MOTION FROM ELEMENT-WISE FACTORIZATION
unknown structure. Compared with affine factorization, this is a much harder problem, mainly because depths are not known a priori. Of course, one can compute these projective depths by other means, e.g., via a string of fundamental matrices or trifocal tensors, via a reference plane, etc. However, such a two-step approach would diminish the elegance of the factorization algorithm. Consider n stationary 3D points xj = [xj , yj , zj , 1]T , j = 1, · · · , n observed by all m perspective cameras Pi , i = 1, · · · , m, under perspective camera model, the j-th 3D point xj is projected onto image point mij = [uij , vij , 1]T by mij =
1 Pi xj , λij
(1)
where λij is a scale factor, commonly called “projective depth”. It is easy to see that λij = 1 when the camera reduces to an affine camera. Collecting the image measurements over all frames, we form a measurement matrix M = [mij ] of size 3m × n. Now the relationship (1) can be compactly written as: 1 ) (PX), (2) M= ( λij where P ∈ R3m×4 and X ∈ R4×n are properly stacked projection matrix and structure matrix. Note that each row of the inverse depth matrix is repeated 3 times. We use a subscript “” to denote such a triple copy. Define W = PX as the rescaled (re-weighted) measurement matrix, we can equivalently re-write Eq. (2) as: ⎡ ⎤ P1 ⎢ ⎥
W = Λ M = PX = ⎣ ... ⎦ x1 · · · xn , (3) Pm
3
P and X, solve for Λ via least squares; (3) alternate between the above two steps till converge. In [2], the rescaled measurement matrix W is normalized through normalizing each column and triplet of rows iteratively. Triggs [3] added a third balancing stage to the SturmTriggs type iterative algorithms that rescales the projective depth λij to make them close to 1. This lessens the algorithm’s bias and helps to steer it away from trivial minima. The balancing can be done in two passes, by first scaling λij for each image i as λij → n αi λij , j=1 |λij |2 = n and then scaling λij for each point m j as λij → βj λij , i=1 |λij |2 = m, where αi , βj are scaling coefficients to determine. Optionally, this procedure can be repeated several times or iterated to converge. Convergence of this balancing strategy is stated in the following theorem: Theorem 2.1: For element-wise nonzero matrix Λ, the n 2 L2 norm constrained balancing strategy j=1 |λij | = m 2 n, i=1 |λij | = m is guaranteed to converge. In stead of using L2 norm balancing, we propose to apply L1 norm balancing strategy by nfirst scaling λij → αi λij for each image i such that j=1 |λij | = n and m then scaling λij → βj λij for each point j such that i=1 |λij | = m. When all the projective depths λij are positive, the above L1 norm balancing strategy reduces to the following column-sum and row-sum balancing strategy: n λij → αi λij , j=1 λij = n; m (4) λij → βj λij , i=1 λij = m.
where Λ = [(λij )] ∈ R3m×n , i.e. a triple copy. As P and X both have a rank at most 4, matrix W must have a rank at most 4. This is an essential constraint of projective factorization. Once the projective depth matrix Λ is recovered, projective factorization problem can be solved via SVD as W = PX.
For the L1 norm balancing strategy, we have the following convergence theorem: Theorem 2.2: For element-wise nonzero matrix n Λ , the L1 norm constrained balancing strategy j=1 |λij | = m n, i=1 |λij | = m is guaranteed to converge. Proofs of Theorem 2.1 and Theorem 2.2 are given in supplement material. Obviously, convergence of the column-sum and row-sum constrained balancing strategy is guaranteed in Theorem 2.2 too. Column-sum and row-sum balancing strategy will be incorporated with our formulation due to its linearity.
2.2 Balancing Strategies
2.3
In the above subsection, we have observed that the main difficulty for projective factorization lies in the recovery of projective depths. Obviously, there is an ambiguity in projective depths. If one family of projective depths λij yields a measurement matrix that can be decomposed into motion and structure, another family λij given by λij = μi ηj λij also yields a decomposable measurement matrix [4], where μi , ηj are arbitrary coefficients. Usually, to handle ambiguity and avoid possible trivial solutions (e.g., all depths being zero, or all but 4 columns of the depth matrix being zero, etc.), some kind of column-wise and row-wise normalization/balancing is necessary. The Sturm-Triggs type iterative algorithms [2], [3] solve the projective factorization problem through alternation: (1) fix Λ, solve for P and X via SVD; (2) fix
Here, we formulate the projective SfM problem as an element-wise factorization (i.e, Hadamard factorization) problem as opposed to the conventionally adopted matrix factorization.
Element-wise Factorization Formulation
2.3.1 Our Main Idea We repeat the basic equation for projective factorization: W = Λ M = PX. Recall that M is the only input, and the task is to solve for both Λ and W. Note that denotes element-wise multiplication, therefore we can view the problem as an element-wise factorization problem, i.e., • Given measurement matrix M, find two matrices Λ and W such that W = Λ M. At first glance, this seems to be an impossible task, as the system is severely under-constrained, i.e. the number
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE PROJECTIVE MULTI-VIEW STRUCTURE AND MOTION FROM ELEMENT-WISE FACTORIZATION
of unknowns to infer is twice as many as the given measurements in M. However, for the particular problem of element-wise factorization, we have extra conditions on the unknown matrices which may hopeful sufficiently constrain the system. Firstly, according to the projective imaging model W = PX, we have rank(W) ≤ min(rank(P), rank(X)) ≤ 4. Thus the re-scaled measurement matrix W has a rank at most 4. This is true for noise-free case (we will further relax this constraint in the actual computation). Under degenerate case, the rank of W will decrease in further. Secondly, the rank of Λ is also at most 4. This is easy to observe, since Λ is a sub-matrix of W, hence, rank(W) ≤ 4, rank(Λ) ≤ rank(W) ⇒ rank(Λ) ≤ 4. These two constraints decrease the degrees of the solution dramatically. Therefore, instead of finding two general condition matrices, we aim to find two low rank matrices W and Λ such that W = Λ M. Thirdly, all visible points’ projective depths must be positive. This is nothing but the cheirality constraint commonly used in multi-view geometry [10]. In other words, visible points must lie in front of the camera. Even noise and outliers do not violate this constraint. Fourthly, all the columns and rows of Λ have been normalized to have (average) unit sum that is i λij = m, j λij = n. These column-sum and row-sum constraints play two roles: (1) rule out trivial solutions such as all λij = 0, all columns but four are zeros and etc; (2) rule out scale ambiguity in the factorization. These constraints are in fact very mild, reasonable, and not restrictive. Roughly speaking, these extra conditions are expected to supply the system with sufficient constraints, making the element-wise factorization problem well-posed and hence solvable. A rigorous proof is provided in supplemental material. 2.3.2 Formulation Taking all the constraints into consideration, mathematically, element-wise factorization for projective multiview structure and motion under complete measurements case is formulated as: Find W, Λ, such that, W = Λ M, rank(W) ≤ 4, λ = m, j = 1, · · · , n, ij i λ = n, i = 1, · · · , m, ij j λij > 0.
(5)
We do not need to enforce the low rank constraint rank(Λ) ≤ 4 on projective depths as it is naturally embodied in low rank constraint on W. The constraint λij > 0 represents the cheirality constraint. The columnsum and row-sum constraints are included to avoid trivial solutions and deal with scale ambiguity. Complete analysis on the existence and uniqueness of solution to our formulation Eq-(5) are presented in supplemental material.
3
4
SDP F ORMULATION
In Section 2, we have formulated projective reconstruction problem as a novel element-wise factorization problem. To solve the element-wise factorization problem, we will relax it to nuclear norm minimization, which can be solved as an SDP. 3.1
Rank Minimization
Recently rank minimization has been widely utilized in computer vision for low rank fitting problem [19] [20] [21]. Under noiseless and complete measurement case, our element-wise factorization formulation Eq-(5) with rank constraint rank(W) ≤ 4 is equivalent to the following rank minimization problem: Minimize rank(W), subject to, W = Λ M, ΛT 1m = m1n , Λ1n = n1m , λij > 0.
(6)
where 1m , 1n are vectors with 1 element-wisely of length m and n respectively. Once this problem is solved, we can use Λ as the estimated depth matrix, and W as the rescaled measurement matrix. 3.2
Nuclear Norm Minimization
To solve rank minimization problem exactly is intractable in general, and in fact NP-hard [22]. To overcome this problem, nuclear-norm has been introduced as the tightest convex relaxation of the rank function. The nuclear norm of a matrix X ∈ Rm×n is defined as min(m,n)
X∗ =
σi (X),
(7)
i=1
where σi (X) is the i-th singular value of X. Recently, using nuclear norm minimization to solve rank minimization problem has received considerable attentions, in particular in the research area of compressive sensing, also this relaxation has been used in computer vision and machine learning [19], [20]. One surprising result is that for a large class of matrices satisfying some “incoherency” or “restricted isometry” properties, nuclear norm minimization actually gives an exact solution. In other words, the relaxation gap is zero. In this paper, we use the nuclear norm as a convex surrogate (a relaxation) to the rank function, mainly for the purpose of approximately solving our projective factorization problem. It is also worth noting that, long before the emergence of the compressive sensing research, nuclear-norm heuristics has already been known and widely used for solving rank minimization [23]. Theorem 3.1: The convex envelope2 of rank(X) on the set S = {X ∈ Rm×n , X ≤ 1} is the nuclear norm X∗ , where X = σ1 (X) is the operator norm of X [24]. 2. The convex envelope of f : C → R is defined as the largest convex function g such that g(x) ≤ f (x) for all x ∈ C. This means that among all convex functions, g is the one that is closest (point-wise) to f .
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE PROJECTIVE MULTI-VIEW STRUCTURE AND MOTION FROM ELEMENT-WISE FACTORIZATION
As our objective function aims to minimize the nuclear norm of W = Λ M, scaling of Λ will change the nuclear norm but not the rank. The column-sum and row-sum normalization constraints fix this scaling freedom and make all the solutions comparable. Using nuclear norm heuristics, we replace the original objective function rank(W) with W∗ and reach the following nuclear norm minimization formulation:
To illustrate the performance of nuclear norm minimization, one illustrative result is shown in Fig. 1 in logarithm. We demonstrate the singular value distribution for ground truth, affine approximation (λij = 1), result of our method and result of our method after low-rank projection. Our method outputs reprojection error 6.59 × 10−3 pixels before low-rank projection and 3.68 × 10−9 pixels after low-rank projection. Our method achieves almost “perfect recovery” result even the ground truth configuration is far away from affine approximation.
(8) 0
0
10
10
−5
Furthermore, the nuclear norm minimization min W∗ can be rewritten as an equivalent SDP problem:
10
−2
Singular Value
Singular Value
Minimize W∗ , subject to, W = Λ M, ΛT 1m = m1n , Λ1n = n1m , λij > 0.
5
−10
10
−15
10
−4
10
10
−6
10
1 min (tr(X) + tr(Y)) W 2 X W 0. s.t. WT Y
−20
10
0
5
10
15 Index
20
25
30
0
5
10
(a)
15 Index
20
25
30
20
25
30
(b) 0
0
10
10
−2
Z∗ =
min
U,V:Z=UVT
1 (U2F + V2F ). 2
(9)
If rank(Z) = k ≤ min(m, n), then the minimum above is attained at a factor decomposition Z = Um×k VTn×k . Combining everything together, we finally obtain a trace norm minimization formulation for projective structure and motion recovery: 1 min (tr(X) + tr(Y)) W 2 X W 0, s.t. WT Y W = Λ M, ΛT 1m = m1n , Λ1n = n1m , λij > 0.
(10)
This is a standard semi-definite programming (SDP), thus can be solved efficiently using any off-the-shelf SDP solvers such as SeDuMi [26] and SDPT3 [27]. Subsequently the solution can be fed into a single SVD to project the rank minimization solution to low-rank solution, or be used to initialize an iterative bundle adjustment process [13]. However, these state-of-the-art SDP solvers still cannot handle large scale problems, due to excessive memory and computational requirement. 3. Given the SVD of A = UΣVT , B is uniquely determined as UΣUT and C is uniquely determined as C = VΣVT . tr(B) + tr(C) = 2A∗ .
Singular Value
Singular Value
10
Such an equivalence is grounded on the following theorem (ref. [23]). Theorem 3.2: Let A ∈ Rm×n be a given matrix and t ∈ R, then we have A∗ ≤ t if and only if there exist two symmetric matrices B = BT ∈ Rm×m and C = CT ∈ Rn×n B A 0. 3 such that tr(B) + tr(C) ≤ 2t and AT C There are other formulations of the nuclear norm based on factorization of Z into two matrices U and V: Lemma 3.3: [22] [25] For any matrix Z, the following relationship holds:
−4
10
−6
10
−5
10
−10
10
−8
10
−15
−10
10
0
5
10
15 Index
20
25
30
10
0
5
10
(c)
15 Index
(d)
Fig. 1. Comparison of singular value distribution in logarithm. (a) Singular value distribution of ground truth. (b) Singular value distribution of affine approximation. (c) Singular value distribution of our solution before low-rank projection. (d) Singular value distribution of our solution after low-rank projection.
4
N UMERICAL I MPLEMENTATION
To address the scalability issue in the convex optimization, we design algorithms for large scale nuclear norm minimization based on recent progress in compressive sensing (such as [28], [29], [30], [31]). 4.1
Fixed Point Continuation
First, the Lagrangian version for Eq-(8) without enforcing balancing constraints is: 1 μW∗ + W − Λ M2F , (11) 2 where μ is a trade-off parameter. The residual entry 1 2 2 W − Λ MF can be equivalently expressed as: m n (mTij wij )2 1 1
2 T wij wij − , (12) W − Λ MF = 2 2 i=1 j=1 mTij mij where the equality originates from the fact that the λij minimizes the residual is defined as: λij = min wij − λij mij 2F = λij
T mij wij . T mij mij
(13)
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE PROJECTIVE MULTI-VIEW STRUCTURE AND MOTION FROM ELEMENT-WISE FACTORIZATION
1 μ (20) R+ ← arg min LRT − Λ M2F + R2F , R 2 2 1 Λ+ ← arg min LRT − Λ M2F Λ 2 + Γ, ΛT 1m − m1n + Υ, Λ1n − n1m (21) β + (ΛT 1m − m1n 2F + Λ1n − n1m 2F ), 2
Denote g(W) as the gradient of 12 W − Λ M2F at the position W, as wij influences residual 12 W−ΛM2F elementwisely, the gradient g(W) can be computed elementwisely as: g(wij ) =
mTij wij ∂ 12 wij − λij mij 2F = wij − T mij . (14) ∂wij mij mij
Our adopted fixed point continuation based algorithm for solving (11) can be expressed in the following twoline algorithm: (k) Y = W(k) − τ g(W(k) ), (15) W(k+1) = Sτ μ (Y(k) ), where τ is the step size in each iteration and Sv (·) is the matrix shrinkage operator [29]. To get rid of the trivial solution W = 0, after updating (k+1) = Y(k) , we firstly recover the projective depth λij (k) T
yij mij , mT ij mij
(k+1)
then we balance Λ(k+1) = [λij ] to satisfy the column-sum and row-sum constraints. This is done by iteratively column and row normalization. 4.2 Alternating Direction Continuation The above strategy works well in practice. However during each iteration, it involves computing SVD of the rescaled measurement matrix W of size 3m×n. To further speed up the implementation, we resort to Augmented Lagrangian Multiplier method which has been widely used in low-rank representation [31], [32]. Based on Lemma 3.3, we slightly modify our formulation Eq-(8) following the low rank parametrization [22]: 1 μ LRT − Λ M2F + (L2F + R2F ), 2 2 s.t. ΛT 1m = m1n , Λ1n = n1m , λij > 0,
Minimize
(16)
where L ∈ R3m×4 , R ∈ Rn×4 give the explicit low rank representation. In the formulation, we apply a continuation technique (i.e., homotopy approach) for accelerating the convergence of the algorithm. The parameter ημ determines the rate of reduction of consecutive μl , μl+1 = max{μl ημ , μ}, l = 1, · · · , L − 1.
(17)
To develop an efficient numerical solver for our formulation Eq-(16), we resort to the classic alternating direction method. We introduce Lagrangian multipliers Υ and Γ to remove the equality constraints. The resulting augmented Lagrangian formulation states as: L(Λ, L, R, Γ, Υ)
=
1 μ LRT − Λ M2F + (L2F + R2F ) 2 2 +Γ, ΛT 1m − m1n + Υ, Λ1n − n1m (18) β + (ΛT 1m − m1n 2F + Λ1n − n1m 2F ), 2
where β > 0 is a penalty parameter. We minimize Eq(18) with respect to L, R and Λ one at a time while fixing the others, and then update the Lagrangian multipliers Υ and Γ as: 1 μ L+ ← arg min LRT − Λ M2F + L2F , (19) L 2 2
6
Γ+ ← Γ + β(ΛT 1m − m1n ),
(22)
Υ+ ← Υ + β(Λ1n − n1m ),
(23)
β+ ← min(β, ρβ).
(24)
The update of L and R have closed-form solutions as: L+ = (Λ M)R(RT R + μI)−1 .
(25)
R+ = (Λ M)T L(LT L + μI)−1 .
(26)
The update of projective depth λij Eq-(21) has a closed-form solution too. However, the equation system is of size 3mn×3mn. To further speed up the implementation, we propose to avoid solving for this linear equation system. We compute the projective depth elementwisely from the updated rescaled measurement matrix W+ = L+ RT+ and measurements M as in Eq-(13). Then we balance the projective depth matrix Λ to satisfy the column-sum and row-sum constraints. In this way, we even do not need to update the Lagrangian multipliers Γ, Υ and β. Additionally, we only solve matrix inverse (small scale 4 × 4) instead of computing SVD of large scale matrix as apposed to FPC based implementation. Therefore, we obtain alternating direction continuation based Algorithm 1 for element-wise factorization under complete measurements case. As all the image measurements have been normalized first, we set μ0 = 1, and reduction rate ημ = 0.1 for complete measurements case. Alternating direction continuation method is exclusively used in experiments except being noted. 4.3
Theoretical analysis
In this subsection, we give theoretical analysis to our numerical implementations of proposed element-wise factorization formulation, i.e, computational complexity, convergence and relationship with existing work. 4.3.1 Computational Complexity The proposed algorithm has a fixed number of outer iteration (10 in all our experiments). During each outer iteration, it iteratively solves linear equation updating L, R and Λ. Here we give a demonstrative example to show the number of inner iteration in each outer iteration in Fig. 2(a), where the number of inner iteration dramatically decreases from around 40 to less than 5. To demonstrate the computational complexity of our algorithm, we conducted experiments under various number of points while fixing the number of cameras to 200 and also experiments under various number of cameras while fixing the number of points as 2000.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE PROJECTIVE MULTI-VIEW STRUCTURE AND MOTION FROM ELEMENT-WISE FACTORIZATION
Initialization: Given Λ(0) , set W(0) = Λ(0) M, compute In Fig.3, we illustrate the continuation property of the ¯ L(0) , R(0) through low rank projection, select , μ alternation direction continuation algorithm on Corridor and ημ , fix the sequence of μl as in Eq-(17). sequence. We give the continuation parameter sequence for μ = μ1 , μ2 , · · · , μL do μl in Fig.3(a), number of inner iteration during each outwhile Not converged do er iteration in Fig.3(b) and reprojection error after each Update L as L(k+1) = (Λ M)R(RT R + μI)−1 ; outer iteration in Fig.3(c). Empirically, with the decrease Update R as R(k+1) = (Λ M)T L(LT L + μI)−1 ; of continuation parameter μ, the solution decreases the T reprojection error and approaches the solution with μ = Update W as W(k+1) = L(k+1) R(k+1) ; 10 Update the projective depth element-wisely as 0. In our implementation, the final μ = 0.1 .
1:
2: 3: 4: 5: 6: 7:
600
30
500
25 20 15 10 5 2
3
4 5 6 7 Outer loop iteration count
8
9
10
(a)
600 Time consumption (Seconds).
35 Time consumption (Seconds)
Inner loop iteration count
Experimental results are illustrated in Fig. 2(b) and Fig. 2(c). Empirically, time consumption of our proposed algorithm grows linearly with respect to the number of points and the number of cameras.
400 300 200 100 0 0
2000 4000 6000 8000 10000 Number of points(The number of cameras is fixed as 200)
(b)
500 400 300 200 100 0 0 200 400 600 800 1000 Number of cameras (The number of points is fixed as 2000).
(c)
Fig. 2. Computational complexity of proposed algorithm. (a) Demonstrative example of outer and inner iteration, where 200 cameras observe 7000 points. (b) Time consumption under various number of points; (c) Time consumption under various number of cameras.
4.3.2 Convergence Our numerical implementations originate from elementwise factorization formulation, which avoids the trivial solutions discussed in [9]. By introducing column-sum and row-sum constraints, our element-wise factorization formulation and its numerical implementations at least rule out trivial solutions such as all but four columns of projective depths being zero, one row of projective depths being zero and etc. Note that the alternating direction continuation solution is low rank solution, while the SDP solution and fixed point continuation solution have to be projected to low rank subspace. Convergence analysis for general alternating direction continuation method and low rank parametrization have been given in [31], [33], [34] both theoretically and empirically.
100 90
0.08
80
0.07 0.06 0.05 0.04 0.03
70 60 50 40 30
0.02
20
0.01
10
0 1
2
3
4
5
6
7
Outer loop iteration count
8
9
10
(a)
4 3.5
Reprojection error (Pixels)
0.1 0.09
Inner loop iteration count
Continuation parameter µ
(k+1)
λij = (wTij mij )/(mTij mij ); 8: Balance the projective depth matrix Λ(k+1) under column-sum and row-sum constraints. 9: Compute stopping criteria (k+1) −W(k) F ε(k+1) = W . max{1,Wk F } (k+1) 10: if ε < then 11: Inner iteration converged; 12: end if 13: end while 14: end for Algorithm 1: Element-wise factorization via alternating direction continuation.
0 1
7
0 1
3 2.5 2 1.5 1 0.5
2
3
4
5
6
7
Outer loop iteration count
(b)
8
9
10
0 1
2
3
4
5
6
7
Outer loop iteration count
8
9
10
(c)
Fig. 3. Convergence of the alternating direction continuation method on Corridor sequence. (a) Continuation parameter μ; (b) Number of inner iteration during each outer iteration; (c) Reprojection error after each outer iteration. 4.3.3 Relationship with existing work In the above numerical implementations, the fixed point continuation algorithm minimizes nuclear norm through first order gradient descent. The alternating direction continuation algorithm is an adaption of augmented Lagrangian multiplier method to our element-wise factorization formulation, which solves the convex semidefinite programming problem with non-convex lowrank factorization form, enhancing computational speed while maintaining equivalence with SDP. It is based on the low rank parametrization for nuclear norm minimization [22], and has also been applied in low rank SDP [33] and matrix factorization [34]. PowerFactorization [35] also uses an alternated leastsquares scheme to solve for the motion and shape matrices, which has also been extended to solve rank constrained linear matrix equation [36]. Our proposed alternating direction continuation method is similar to PowerFactorization [35] apparently in the sense of alternation. The main differences are: 1) PowerFactorization originates from solving SVD with missing data through alternative least squares, while our alternating direction continuation method originates from low rank parametrization and augmented Largrangian multipliers method; 2) In our method, continuation step is introduced to speed up the implementation; 3) Balancing of the projective depth is enforced during each iteration to rule out trivial solutions.
5
E XTENSIONS
In the above sections, we have formulated and solved the problem of element-wise factorization under complete measurements case. In this section, we extend the
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE PROJECTIVE MULTI-VIEW STRUCTURE AND MOTION FROM ELEMENT-WISE FACTORIZATION
proposed framework to deal with missing data, outliers and degenerate case.
is achieved by slightly modifying (10): 1 min (tr(X) + tr(Y)) W 2 X W 0, s.t. WT Y Λ M = Ω W, λij > 0, if ωij = 1, λkl = 1, if ωkl = 0, ΛT 1m = m1n , Λ1n = n1m .
5.1 Dealing with Missing Data In most real-world SfM applications, missing data are inevitable, due to e.g. self-occlusion, points behind cameras (i.e., cheirality) etc. Missing data lead to an incomplete measurement matrix, however simple SVD cannot directly perform on an incomplete matrix. This constitutes a major drawback of factorization-based methods. For affine (camera) factorization, many missing-data handling ideas have been proposed as Buchanan and Fitzgibbon summarized in [37]. Angst, Zach and Pollefeys [25] proposed generalized trace norm and showed how to incorporate the prior on the camera motion and structure. Del Bue et al. [38] formulated the problem of bilinear factorization and solved the problem of SfM under affine camera model. Olsson and Oskarsson [39] proposed to deal with missing data using matrix completion which is similar to our work but restricted to affine camera model. Unfortunately, relatively less work was reported for projective factorization with missing data. Most existing works either rely on alternation [6], [40], or assume the depths are pre-computed by other means (reducing to affine case) [41], [42]. Some methods are heuristic or ad hoc, and have no longer the factorization flavor [43]. One of the reasons is probably the complexity of the problem: as we have seen, before introducing missing data, the unknown projective depths have already made the factorization framework very hard. In this subsection, we extend our element-wise factorization formulation for complete measurements to incomplete measurements case. Under the circumstance of projective factorization with missing data, we are given an incomplete measurement matrix M = [mij ] with missing data, where the missing elements are completed with 0. To index the missing pattern and elements, we define a 0 − 1 mask matrix Ω as 1 ∈ R3 , if mij is available, Ω = [ω ij ], ω ij = (27) 0 ∈ R3 , if mij is missing. With these notations, the projective imaging process with missing data can be compactly written as: Λ M = Ω W. Now our task becomes: • Given an incomplete measurement matrix M and the corresponding mask matrix Ω, find a completed lowrank matrix W and projective depth matrix Λ such that Λ M = Ω W. For those missing positions, we do not need to estimate the corresponding depths, so we set λij = 1 whenever ω ij = 0. This setting guarantees the column-sum and row-sum constraints on the available measurements and all the projective depths possess appropriate scales. Applying the nuclear norm heuristics, our SDP formulation for element-wise factorization with missing data
8
(28)
Once this SDP converges, the resultant W is a completed 3m × n full matrix with no entries missing. Moreover, we can even read out a completed projective depth matrix just as the sub-matrix of W formed by every the third rows of W. Our alternating direction continuation based algorithm can be extended to missing data, detailed analysis and algorithm are presented in supplemental material. 5.2
Dealing with Outliers
Another recurring practical issue in real world SfM applications is the outlier problem. Different from the missing data case, for the outlier case, we know that some entries of the measurement matrix M are contaminated by gross errors (i.e., wrong matches), but we do not know where they are. For conventional projective factorization methods, there is no easy and unified way to deal with outliers. Most published works are based on some pre-processing such as using RANSAC [11]. Ke and Kanade [44] formulated matrix factorization as an L1 norm minimization problem. Eriksson and Hengel [45] represented a generalization of the Wiberg algorithm calculating the lowrank factorization of a matrix in the presence of missing data. However, these methods are designed for general low-rank matrix approximation problem and work for affine camera model naturally. Here, we show how our element-wise factorization formulation can handle outlier problem nicely and uniformly. Under projective factorization framework, the input measurement matrix M is not of the form of lowrank + outliers but the rescaled measurement matrix W is of the form of low-rank + outliers. The main difficulty is that we have to estimate the projective depths that transform the input measurement matrix to low-rank + outliers which can be recast as low rank and sparsity decomposition problem [46], [47]. First, we give basic assumptions on outliers handling: 1) Outliers do not violate the cheirality constraint, which means even the outlier positions have positive projective depths; 2) Outliers are sparse. As pre-processing work such as RANSAC [11] or outlier removal has been applied, there is only small proportion of outliers in the measurement matrix. In this problem, we aim to recover the low rank rescaled measurement matrix W and sparse outlier pattern E given partial observable measurement matrix M, where the missing pattern is denoted as Ω. The basic
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE PROJECTIVE MULTI-VIEW STRUCTURE AND MOTION FROM ELEMENT-WISE FACTORIZATION
imaging equation can be compactly written as Ω (W + E) = Λ M. Our target is to find projective depth matrix Λ, rescaled measurement matrix W and outlier pattern E such that W possesses minimal rank and E is as sparse as possible. To quantify the sparseness of E, we use its L1 norm (X1 = i,j |Xij |) as a relaxation of its L0 -norm4 . Combine the nuclear-norm heuristics for W and L1 norm heuristics for E, the objective function is set as min W∗ + ηE1 , W,E
(29)
where η is a trade-off parameter. By introducing auxiliary matrix Z ∈ R3m×n , E1 is equivalent to tr(Z · 1n×3m ), −Zij ≤ Eij ≤ Zij , ∀(i, j) as in [46]. Taking all the constraints into consideration, the final formulation for element-wise factorization with outliers becomes: 1 min (tr(X) + tr(Y)) + ηtr(Z · 1n×3m ) W,E,Λ,X,Y,Z 2 X W s.t. 0, WT Y Ω (W + E) = Λ M, ΛT 1m = m1n , Λ1n = n1m , (30) −Zij ≤ Eij ≤ Zij , ∀(i, j), E3i,j = 0, i = 1, · · · , m, j = 1, · · · , n, Z3i,j = 0, i = 1, · · · , m, j = 1, · · · , n, λij > 0, if ωij = 1, λkl = 1, if ωkl = 0. where 1n×3m ∈ Rn×3m is a matrix having 1 as its every entry. The above model is thus an SDP formulation and can be solved using off-the-shelf solvers. Our alternating direction continuation based algorithm can be extended to handle outliers, detailed analysis and algorithm are presented in supplemental material. 5.3 Degenerate Case Since we have changed the goal from finding a rank ≤ 4 matrix to rank minimization, our method is applicable to other cases as well. For example, if all the scene points lie on a plane or the camera undergoes some special motion (such as planar motion), this constitutes a “degenerate” case causing rank < 4, which may hinder conventional factorization methods. However, our method is immune to these situations, as confirmed by our experiments.
6
E XPERIMENTAL R ESULTS
In this section, we provide extensive experimental results on both synthetic data and real images for complete measurements case, as well as incomplete measurements case, outlier case and degenerate case. Reprojection error in the image and relative projective depth error (if the ground-truth depths are known) are used to evaluate the performance. Relative depth error ˆ ΛF is defined as ε = Λ− ΛF , where Λ is the ground truth 4. We use E0 = Λ E0 , since λij > 0.
9
ˆ is the projective depth matrix after balancing and Λ recovered projective depth matrix after balancing. Once we have upgraded the projective reconstruction to Euclidean reconstruction (Euclidean upgrading is discussed in supplemental material), we can evaluate the performance of our algorithms in the aspect of 3D reconstruction. Given the ground truth 3D structure ˆ after registration, the 3D X, the recovered structure X ˆ−XF reconstruction error is defined as = XX . Denote the F ground truth focal length for the i-th frame as fi , the recovered focal length as fˆi , the focal length estimation ˆ i| error is defined as εi = |fif−f . The rotation estimation i trace(RT ˆ R )−1
i i error (angle) is defined as θi = acos( ), where 2 ˆi is the estimated Ri is the ground truth rotation and R rotation (after coordinate registration). All the experiments were conducted with our alternating direction continuation implementation except for outliers handling using Matlab R2012a without any optimization on a machine with Core i7 3.4 GHz processor and 8GB of memory under Windows 7 operating system. The image coordinate has always been normalized as suggested by [48].
6.1
Synthetic Experiments
6.1.1 Exact Recovery As we have relaxed the formulation from low-rank fitting problem to nuclear norm minimization, there is one question left: to what extent can our formulation and algorithms recover the structure and motion exactly? Theoretically, there is a relaxation gap as has been analyzed in the above sections. To cross this gap, we project the result to low rank space. In this part, we conduct extensive experiments on synthetic data and give statistical results. In these synthetic experiments, we randomly generated 100 points within a cube of [−30, 30]3 in space and 30 perspective cameras were simulated. The image size is set as 800 × 800. The camera parameters are set as follows: the focal lengths are set randomly between 900 and 1100, the principal point is set at the image center, and the skew is zero. To measure the variation of projective depths, we define the depth variation as r = maxij (λij )/ minij (λij ), i.e. the ratio between the maximal depth and the minimal depth after balancing. Statistical results on synthetic images are counted from more than 1700 random experiments as demonstrated in Fig. 4. We observe that no matter what level of the depth variation, our algorithm can always output results with reprojection error less than 3.5 × 10−6 pixels while most of the reprojection errors are less than 1 × 10−8 pixels, which can be determined as almost “exact recovery”. 6.1.2 Noise Sensitivity In this part, we provide experimental validation for the proposed algorithms in the aspect of reprojection error
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE PROJECTIVE MULTI-VIEW STRUCTURE AND MOTION FROM ELEMENT-WISE FACTORIZATION
10
0.3
Reprojection Error (Pixels)
0 Z
−4 −6 −5 0
X
5 10 −10
(b)
Fig. 4. Statistical result for our proposed method. (a) Distribution of the depth variation. (b) Reprojection error distribution of our method where local part is zoomed in.
6.1.3 Large Depth Variation As analyzed in the previous sections, for configurations near to affine camera model (λij = 1), iterative algorithms tend to succeed. However for configurations far away from this setting, these algorithms tend to fail in converging or converge to trivial solutions. In this part, we explore this aspect. We test on two cases, the first one is that all the depth variations are within [1, 5) i.e small variation, the other is that all the depth variations are within [5, 30] i.e. large variation. We compare our algorithm with state-of-the-art iterative methods as CIESTA [9], SIESTA with L2 norm balancing, SIESTA with L1 norm balancing and Column Space method [40]. To compare all these results, we use the form of reprojection error histogram, which gives the
4 2 0 0
0.6 0.5 0.4 0.3 0.2 0.1 0 0
1
2 3 Noise Level(Pixels)
0.2 0.15 0.1 0.05
1
2 3 Noise Level(Pixels)
4
0 0
5
1
(b)
0.7
(d)
and 3D reconstruction under various levels of Gaussian noise. To obtain dataset with ground truth, we use the Notre Dame dataset [14]. This set is a community photo collection consisting of 595 images and 277887 3D points and has been reconstructed and bundle-adjusted, resulting in estimates of all the camera matrices, which we take to represent ground truth. The real image measurements are highly sparse(less than 1%), we project all the points to all the cameras and choose the image measurements with positive depths. We utilize 50 cameras and 400 points, which is a middle size problem however SDP solvers such as SDPT3 and SeDuMi can not solve. The focal lengths vary from 600 pixels to 79820 pixels. Gaussian noise with levels ranging from 0 pixels to 5.0 pixels is added to the measurements. For each noise level, 50 trails are averaged to obtain statistical results. Fig. 5(a) shows one illustrative example of ground truth 3D structure and recovered structure after registration. Fig. 5(b) and Fig. 5(c) demonstrate the reprojection error and relative depth estimation error under various Gaussian noise levels. Fig. 5(d) demonstrates the relative 3D structure error. Fig. 5(e) demonstrates the rotation estimation error in degrees and Fig. 5(f) demonstrates the relative error in focal length estimation. From these figures, we observe that for noiseless data, our algorithm recovers the camera motion and structure exactly and all the evaluation metrics increase with the increase of Gaussian noise level.
5
0
6
0.25
10
Rotation Estimation Error(Degrees)
(a)
Y
8
(a)
Reprojection error (Pixels)
Relative 3D Structure Error (%)
Depth variation
−5
10
4
5
2.5 2 1.5 1 0.5 0 0
1
2 3 Noise Level(Pixels)
2 3 Noise Level(Pixels)
4
5
4
5
(c)
3
Relative Focal Length Error (%)
Frequency
Frequency
2
−2
Relative Depth Error (%)
12 4
4
5
(e)
14 12 10 8 6 4 2 0 0
1
2 3 Noise Level(Pixels)
(f)
Fig. 5. Performance evaluation of our method under various levels of Gaussian noise given complete measurements. (a) 3D Reconstruction comparison between the ground truth and recovered result; (b) Reprojection error; (c) Relative depth estimation error; (d) Relative 3D error; (e) Rotation estimation error; (f) Relative focal length estimation error.
distribution of reprojection error and shows statistical property for each algorithm. Statistical results on more than 1700 randomly configurations are illustrated in Fig. 6. Fig. 6(a) and Fig. 6(b) illustrate depth variation distributions for small depth variation and large depth variation respectively. Fig. 6(c) to Fig. 6(l) demonstrate the results for our proposed method, CIESTA, column space method, SIESTA with L1 balancing and SIESTA with L2 balancing under small and large depth variations respectively. As we expect, our method outperforms all state-of-the-art iterative methods by a significant margin. This can be explained that our method is a global solution and does not depend on initialization. However all other algorithms highly depend on initialization, where affine camera model is widely used as initialization which is not the case for large depth variation. 6.1.4
Synthetic Images: Missing Data
To evaluate the performance of our algorithm on incomplete measurements, we generated synthetic data sets as before followed by removing various proportions of 2D image measurements to simulate missing data. On the 50 × 400 Notre Dame synthetic dataset, we randomly generated the missing pattern according to uniform distribution, where the missing ratios vary from 10% to 70%. Gaussian noise is added to the measurements. For each missing ratio and Gaussian noise level, 50 trails are repeated. Statistical results for various noise levels under different missing ratios are demonstrated in Fig. 7. From the figure, we can reach the conclusion that for noiseless measurements, our algorithm can always recover the structure and motion perfectly no matter what level
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE
Depth variation
Depth variation
Frequency Reprojection error (Pixels)
(d)
Frequency
(f)
(h)
(i)
(j)
Frequency
Frequency
Reprojection error (Pixels)
Reprojection error (Pixels)
Reprojection error (Pixels)
(g)
Frequency
(e)
Frequency Reprojection error (Pixels)
Reprojection error (Pixels)
Reprojection error (Pixels)
Reprojection error (Pixels)
(c)
Frequency
(b)
Frequency
(a)
Frequency
11
Frequency
Frequency
Frequency
PROJECTIVE MULTI-VIEW STRUCTURE AND MOTION FROM ELEMENT-WISE FACTORIZATION
Reprojection error (Pixels)
(k)
Reprojection error (Pixels)
(l)
Fig. 6. Performance comparison between our method and the state-of-the-art iterative methods under various level of depth variations. Local parts of the histograms are always zoomed in to give detailed comparison. (a) Histogram of small depth variation (1 ≤ r < 5); (b) Histogram of large depth variation (5 ≤ r < 30); (c) Performance of our proposed method under small depth variation; (d) Performance of our proposed method under large depth variation. (e) Performance of CIESTA under small depth variation; (f) Performance of CIESTA under large depth variation; (g) Performance of column space method under small depth variation; (h) Performance of column space method under large depth variation; (i) Performance of SIESTA L1 method under small depth variation; (j) Performance of SIESTA L1 method under large depth variation; (k) Performance of SIESTA L2 method under small depth variation; (l) Performance of SIESTA L2 method under large depth variation. the missing ratio is. For noisy measurements, the performances (reprojection error and structure estimation error) decrease with the increase of noise level and missing ratio.
15
Noise Level 0 Noise Level 1 Noise Level 2 Noise Level 3 Noise Level 4 Noise Level 5
Relative 3D Structure Error (%)
Reprojection Error (Pixels)
20
10
5
0 0
10
20
30
40
Missing Ratio (%)
(a)
50
60
70
2
1.5
Noise Level 0 Noise Level 1 Noise Level 2 Noise Level 3 Noise Level 4 Noise Level 5
1
0.5
0 0
10
20
30
40
50
60
70
Missing Ratio (%)
(b)
Fig. 7. Performance evaluation under various ratios of missing data. (a) Reprojection error VS Missing ratio; (b) Relative 3D structure error VS Missing ratio.
6.1.5 Synthetic Images: Outlier Handling To illustrate the performance of our method for projective factorization with outliers, we generated the following illustrative example first. The configuration is 15 cameras observing 25 points, leading to measurement
matrix of size 30 × 25. The outlier pattern, i.e. sparse component E is generated by choosing a support set of size k uniformly at random, E is a matrix with independent Bernoulli ±1 entries while the ground truth measurements are normalized √ coordinates with zero mean and standard variation 2. In the experiments, we set the outlier ratio as 5% and 10%, which is a realistic percentage of outliers, assuming that a method such as RANSAC [11] has been applied to remove most outliers. We evaluate the performance of outlier handling in the view of outlier detection. Precision and recall are used to qualify the performance, which are defined as follows: Recall =
TP TP , Precision = , TP + FN TP + FP
(31)
where TP is the true positive, FP is the false positive and FN is the false negative. Illustrative example is shown in Fig. 8, where the outlier ratio is 10% while 10% of the measurements are missing. Our method outputs result with Recall = 0.9473, Precision = 0.9643, the relative depth error is 6.00% while the relative error in low rank component recovery is 7.68%. Our algorithm not only detects the outliers
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE PROJECTIVE MULTI-VIEW STRUCTURE AND MOTION FROM ELEMENT-WISE FACTORIZATION
12
but also completes the measurements and recovers the projective depths.
(a)
(a)
(b)
(c)
(b)
(c)
Fig. 10. Real image sequences. (a) Teabox [49], (b) Corridor, (c) Dinosaur.
(d)
Fig. 8. Illustrative example for outliers and missing data handling, where the outlier ratio is 10% and the missing ratio is 10%. (a) Input incomplete measurement matrix, (b) Recovered measurement matrix, (c) Ground truth outlier pattern, (d) Recovered outlier pattern.
3 2 1 0 −1 −2 −3 −2
−4 −5
0
−6 −6
We conducted experiments under various outlier ratios and missing data ratios up to 30% to obtain statistical result. For each outlier ratio and missing data ratio combination, we repeat experiments 50 times. Statistical results of outliers and missing data handling simultaneously are demonstrated in Fig. 9. 1
−4
−2
(a)
0
2
4
6
2
(b)
Fig. 11. Real image chair sequence. (a) One frame of the sequence, (b) Euclidean upgrading result of the Chair sequence. The average value for nine right angles is 86.9832◦ .
1
Outlier Ratio 10% Outlier Ratio 5%
6.2.2
0.95
Real Images: Missing Data
0.9
0.8 0.85 0.75 0
5
10 15 20 Missing Ratio (%)
25
30
0
Outlier Ratio 10% Outlier Ratio 5% 5
(a) Measurement Recover Error (%)
Relative Depth Error (%)
Outlier Ratio 10% Outlier Ratio 5%
8 6 4 2 0 0
5
10 15 20 Missing Ratio (%)
25
30
25
30
(b)
12 10
10 15 20 Missing Ratio (%)
25
30
25 20
Outlier Ratio 10% Outlier Ratio 5%
15 10 5 0 0
5
(c)
10 15 20 Missing Ratio (%)
(d)
Fig. 9. Statistical result for outliers and missing data handling. (a) Outlier detection precision, (b) Outlier detection recall, (c) Projective depth recovery error, (d) Measurement recovery error.
We tested our algorithm for projective factorization with missing data on the dataset shown in Fig. 10(b) and 10(c). The Dinosaur sequence Fig. 10(c)5 is arguably the most popular dataset used to evaluate rigid SfM algorithms and is conventionally used as an example of affine factorization. Here we however solve it as a projective factorization with missing data problem. Our experiments considered the 36 × 319 measurement case. Experimental results are demonstrated in Tab. 2, where we gave the results of our proposed method and the results after bundle adjustment (denoted as “+BA”). 3D reconstruction results on the Dinosaur and Corridor sequences are demonstrated in Fig. 12. −0.5
80
Reconstructed 3D points Gound truth 3D points
−0.55
Reconstructed 3D points Ground truth 3D points
60
−0.6 Z
0.85
Z
Recall
Precision
0.95 0.9
−0.65
40 20
−0.7 0 20 −0.75 0.05 0 X −0.05
X 0 0.04
0.02
−0.02 −0.04 −0.06 −0.08 −0.1
0
−20
−10
−5
Y
6.2 Real Images Experiments
(a)
0
Y
5
10
15
(b)
6.2.1 Real Images: Complete Data In this part, we evaluate the proposed methods on real images, where the real images used for complete data are shown in Fig. 10. Reprojection errors for these images are illustrated in Tab. 1. Our method obtains comparable results if not superior.
Fig. 12. 3D reconstruction results. (a) Dinosaur, (b) Corridor.
We also implemented Euclidean upgrading for the Chair sequence. Orthogonal relationships in the original scene are measured on the reconstructed 3D scene as shown in Fig. 11. The average measured angle is 86.9832◦ for the 9 right angles in the scene.
Experimental results on degenerate scene (a calibration pattern [49]) are illustrated in Fig. 13. Fig. 13(a) shows one illustrative image of the calibration pattern and
6.2.3
Real Images: Degenerate Scene
5. Available at http://www.robots.ox.ac.uk/∼vgg/.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE PROJECTIVE MULTI-VIEW STRUCTURE AND MOTION FROM ELEMENT-WISE FACTORIZATION
13
TABLE 1 Performance evaluation for real image complete data (Reprojection error in pixels) Dataset Corridor(11 × 104) Teabox(2 × 16) Chair(6 × 9)
SIESTA [9] 0.4328 4.4820e-04 1.3584
CIESTA [9] 0.4296 4.4872e-04 1.3723
Col-space [8] 0.4501 5.1260e-04 1.4759
TABLE 2 Performance evaluation for real image with missing data (Reprojection error in pixels) Dataset Dinosaur(36 × 319, 76.92% Missing) Corridor(11 × 737, 50.23% Missing)
Proposed method 1.7510 0.4217
+BA 1.5893 0.4003
Fig. 13(b) illustrates the corresponding recovered balanced projective depth. Fig. 13(c) shows the performance comparison between our method, SIESTA, CIESTA and Column space method under various number of points. Finally, Fig. 13(d) gives the Euclidean reconstruction. Normalized Projective Depth
1.04
Reprojection Error (Pixels)
0.2
ACKNOWLEDGMENTS We would like to thank Prof. Richard Hartley for his invaluable support and discussions. We would like to thank the anonymous reviewers of TPAMI and ECCV 2010 for constructive comments that improves the manuscript. This work was supported in part by NSFC (60736007, 61171154), NSF of Shaanxi Province (2010JZ011), Australian Research Council (ARC) discovery grant and ARC linkage grant.
0.98
[2]
0.96
50
100 150 Feature Index
200
250
[3]
(b) 80
CIESTA SIESTA Column Space Proposed Method
these methods also work. In many other practical cases when these conventional methods do not work well, our method still produces satisfactory results.
[1]
1
(a) 0.25
[4]
60 40 20 0
[5]
0.15 −20 −40
0.1
−60 −80
0.05
50
100 150 200 Number of Points
(c)
250
−100 −50
0
50
100
150
(d)
Fig. 13. Experimental results on degenerate scene. (a) Plane1; (b) Recovered projective depth for Plane1; (c) Performance comparison under various number of points; (d) Reconstruction of the plane. The reconstruction error occurs due to radial distortion.
[6] [7] [8] [9]
[10]
7
Our method + Bundle Adjustment 0.3763 3.8071e-4 1.2545
R EFERENCES
1.02
0.94 0
Our method 0.4327 4.4820e-4 1.3568
C ONCLUSIONS
In this paper, we have proposed a new method for projective multi-view structure and motion factorization, using a novel element-wise factorization formulation. We represent the problem as an SDP and solve it efficiently and (approximately) globally. Furthermore, we adopt fixed point continuation based and alternating direction continuation based algorithms to address the scalability issue, which prepare our method well for practical large scale applications. The results obtained by our new method are either comparable with or superior to the conventional iteration-based methods–when
[11]
[12] [13] [14] [15] [16]
C. Tomasi and T. Kanade, “Shape and motion from image streams under orthography: a factorization method,” Int. J. Comput. Vision, vol. 9, no. 2, pp. 137–154, 1992. P. Sturm and B. Triggs, “A factorization based algorithm for multiimage projective structure and motion,” in ECCV, 1996, pp. 709– 720. B. Triggs, “Factorization methods for projective structure and motion,” in CVPR, 1996, pp. 845–851. T. Ueshiba and F. Tomita, “A factorization method for projective and Euclidean reconstruction from multiple perspective views via iterative depth estimation,” in ECCV, 1998, pp. 296–310. A. Heyden, R. Berthilsson, and G. Sparr, “An iterative factorization method for projective structure and motion from image sequences,” Image Vision Comput., vol. 17, no. 13, pp. 981 – 991, 1999. Q. Chen and G. Medioni, “Efficient iterative solution to M-view projective reconstruction problem,” in CVPR, 1999, pp. 55–61. M. Han and T. Kanade, “Perspective factorization methods for Euclidean reconstruction,” Robotics Institute, Pittsburgh, PA, Tech. Rep. CMU-RI-TR-99-22, August 1999. S. Mahamud and M. Hebert, “Iterative projective reconstruction from multiple views,” in CVPR, 2000, pp. 430–437. J. Oliensis and R. Hartley, “Iterative extensions of the Sturm/Triggs algorithm: Convergence and nonconvergence,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 29, no. 12, pp. 2217–2233, 2007. R. Hartley and A. Zisserman, Multiple View Geometry in Computer Vision, 2nd ed. Cambridge University Press, 2004. M. A. Fischler and R. C. Bolles, “Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography,” Commun. ACM, vol. 24, no. 6, pp. 381–395, June 1981. Y. Dai, H. Li, and M. He, “Element-wise factorization for n-view projective reconstruction,” in ECCV, 2010, pp. 396–409. B. Triggs, P. F. McLauchlan, R. I. Hartley, and A. W. Fitzgibbon, “Bundle adjustment - a modern synthesis,” in Proc. International Workshop on Vision Algorithms, 2000, pp. 298–372. N. Snavely, S. M. Seitz, and R. Szeliski, “Modeling the world from internet photo collections,” Int. J. Comput. Vision, vol. 80, no. 2, pp. 189–210, 2008. S. Agarwal, N. Snavely, I. Simon, S. Seitz, and R. Szeliski, “Building Rome in a day,” in ICCV, 29 2009-oct. 2 2009, pp. 72 –79. J.-M. Frahm, P. Fite-Georgel, D. Gallup, T. Johnson, R. Raguram, C. Wu, Y.-H. Jen, E. Dunn, B. Clipp, S. Lazebnik, and M. Pollefeys, “Building Rome on a cloudless day,” in ECCV, 2010, pp. 368–381.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE PROJECTIVE MULTI-VIEW STRUCTURE AND MOTION FROM ELEMENT-WISE FACTORIZATION
[17] D. Crandall, A. Owens, N. Snavely, and D. Huttenlocher, “Discrete-continuous optimization for large-scale structure from motion,” in CVPR, june 2011, pp. 3001 –3008. [18] C. Olsson and O. Enqvist, “Stable structure from motion for unordered image collections,” in SCIA. Berlin, Heidelberg: Springer-Verlag, 2011, pp. 524–535. [19] T. Ding, M. Sznaier, and O. I. Camps, “A rank minimization approach to video inpainting,” in ICCV, 2007, pp. 1–8. [20] Y. Peng, A. Ganesh, J. Wright, W. Xu, and Y. Ma, “RASL: Robust alignment by sparse and low-rank decomposition for linearly correlated images,” in CVPR, 2010, pp. 763–770. [21] M. Ayazoglu, M. Sznaier, and O. Camps, “Euclidean structure recovery from motion in perspective image sequences via Hankel rank minimization,” in ECCV, 2010, pp. 71–84. [22] B. Recht, M. Fazel, and P. A. Parrilo, “Guaranteed minimum-rank solutions of linear matrix equations via nuclear norm minimization,” SIAM Review, vol. 52, no. 3, pp. 471–501, 2010. [23] M. Fazel, H. Hindi, and S. Boyd, “A rank minimization heuristic with application to minimum order system approximation,” in Proceedings of the American Control Conference, 2001, pp. 4734–4739. [24] M. Fazel, “Matrix rank minimization with applications,” Ph.D. dissertation, Stanford University, 2002. [25] R. Angst, C. Zach, and M. Pollefeys, “The generalized trace-norm and its application to structure-from-motion problems,” in ICCV, 2011, pp. 2502 –2509. [26] J. Sturm, “Using SeDuMi 1.02, a MATLAB toolbox for optimization over symmetric cones,” Optimization Methods and Software, vol. 11, pp. 625–653, 1999. [27] K. Toh, M. Todd, and R. Tutuncu, “SDPT3 — a Matlab software package for semidefinite programming,” Optimization Methods and Software, vol. 11, pp. 545–581, 1999. [28] J.-F. Cai, E. J. Cand`es, and Z. Shen, “A singular value thresholding algorithm for matrix completion,” SIAM J. on Optimization, vol. 20, no. 4, pp. 1956–1982, 2010. [29] S. Ma, D. Goldfarb, and L. Chen, “Fixed point and Bregman iterative methods for matrix rank minimization,” Math. Program., Ser. A, vol. 128, no. 1-2, pp. 321–353, 2011. [30] E. Cand`es, X. Li, Y. Ma, and J. Wright, “Robust principal component analysis?” J. ACM, vol. 58, no. 3, pp. 11:1–37, 2011. [31] Z. Lin, M. Chen, and Y. Ma, “The Augmented Lagrange Multiplier Method for Exact Recovery of Corrupted Low-Rank Matrices,” ArXiv e-prints, Sep. 2010. [32] R. Liu, Z. Lin, F. De la Torre, and Z. Su, “Fixed-rank representation for unsupervised visual learning,” in CVPR, 2012, pp. 598–605. [33] S. Burer and R. D. Monteiro, “A nonlinear programming algorithm for solving semidefinite programs via low-rank factorization,” Mathematical Programming, vol. 95, pp. 329–357, 2003. [34] K. Mitra, S. Sheorey, and R. Chellappa, “Large-scale matrix factorization with missing data under additional constraints,” in NIPS, vol. 23, 2010, pp. 1642–1650. [35] R. Hartley and F. Schaffalitzky, “PowerFactorization : 3D reconstruction with missing or uncertain data,” in Japan-Australia Workshop on Computer Vision, 2004, pp. 1–9. [36] J. Haldar and D. Hernando, “Rank-constrained solutions to linear matrix equations using powerfactorization,” Signal Processing Letters, IEEE, vol. 16, no. 7, pp. 584 –587, 2009. [37] A. M. Buchanan and A. W. Fitzgibbon, “Damped newton algorithms for matrix factorization with missing data,” in CVPR, 2005, pp. 316–322. [38] A. Del Bue, J. Xavier, L. Agapito, and M. Paladini, “Bilinear modeling via augmented lagrange multipliers (BALM),” IEEE Trans. Pattern Anal. Mach. Intell., vol. 34, no. 8, pp. 1496 –1508, aug. 2012. [39] C. Olsson and M. Oskarsson, “A convex approach to low rank matrix approximation with missing data,” in Scandinavian Conference on Image Analysis, 2009, pp. 301–309. [40] S. Mahamud, M. Hebert, Y. Omori, and J. Ponce, “Provablyconvergent iterative methods for projective structure from motion,” in CVPR, 2001, pp. 1018–1025. [41] D. Martinec and T. Pajdla, “Structure from many perspective images with occlusions,” in ECCV, 2002, pp. 355–369. [42] H. Jia and A. M. Martinez, “Low-rank matrix fitting based on subspace perturbation analysis with applications to structure from motion,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 31, no. 5, pp. 841–854, 2009. [43] D. Martinec and T. Pajdla, “3d reconstruction by fitting low-rank matrices with missing data,” in CVPR, 2005, pp. 198–205.
14
[44] Q. Ke and T. Kanade, “Robust L1 norm factorization in the presence of outliers and missing data by alternative convex programming,” in CVPR, 2005, pp. 739–746. [45] A. Eriksson and A. v. d. Hengel, “Efficient computation of robust low-rank matrix approximations in the presence of missing data using the L1 norm,” in CVPR, 2010, pp. 771–778. [46] V. Chandrasekaran, S. Sanghavi, P. Parrilo, and A. Willsky, “Ranksparsity incoherence for matrix decomposition,” SIAM J. Optim., vol. 21, no. 2, pp. 572–596, 2011. [47] M. Tao and X. Yuan, “Recovering low-rank and sparse components of matrices from incomplete and noisy observations,” SIAM J. on Optimization, vol. 21, no. 1, pp. 57–81, 2011. [48] R. Hartley, “In defense of the eight-point algorithm,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 19, no. 6, pp. 580–593, 1997. [49] Z. Zhang, “A flexible new technique for camera calibration,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 22, no. 11, pp. 1330–1334, 2000.
Yuchao Dai is currently a Postdoctoral Fellow with the Research School of Computer Science at the Australian National University, Canberra, Australia. He received the B.E. degree, M.E degree and Ph.D. degree all in signal and information processing from Northwestern Polytechnical University, Xian, China, in 2005, 2008 and 2012, respectively. He was a visiting student at ANU from Oct. 2008 to Oct. 2009 with the support of the China Scholarship Council. His research interests include structure from motion, multiview geometry, compressive sensing and optimization. He won the best paper award in CVPR 2012.
Hongdong Li (M’01) is a Fellow with the Australian National University. He was also seconded to NICTA (National ICT Australia) as a Senior Research Scientist. His research interests include geometric computer vision, structure from motion, image restoration and bionic eyes, and optimization methods. He received MSc. and Ph.D. degree in Electronics Engineering from Zhejiang University in 2000. After teaching in the same university for three years he joined the ANU/NICTA teaching and doing researches in Computer Vision. He constantly serves on the Programme Committees (or as a reviewer) for all the top interntional Computer Vision conferences and journals, such as ICCV, CVPR, ECCV, PAMI and IJCV. He is an Area Chair for ICCV’13. He was a recepient for the CVPR’12 Best Paper Award.
Mingyi He received the B.Sc. and M.Sc. degrees in electronic engineering from Northwestern Polytechnical University, Xian, China, in 1982 and 1985, respectively, and the Ph.D. degree in signal and information processing from Xidian University, Xian, in 1994. Professor He is the Founder and Director of the Shaanxi Key Laboratory of Information Acquisition and Processing, a professor in School of Electronics and Information, and the Director and Chief Scientist of the Center for Earth Observation Research, Northwestern Polytechnical University. He won 9 scientific prizes from China and the best paper award in CVPR 2012. He has published over 280 papers and 5 books and has made contributions to hyperspectral remote-sensing data processing, visual information processing, neural networks, and 3D technique.