IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 36, NO. 6, JUNE 2014
1041
A Unified Approach for Registration and Depth in Depth from Defocus Rami Ben-Ari Abstract—Depth from Defocus (DFD) suggests a simple optical set-up to recover the shape of a scene through imaging with shallow depth of field. Although numerous methods have been proposed for DFD, less attention has been paid to the particular problem of alignment between the captured images. The inherent shift-variant defocus often prevents standard registration techniques from achieving the accuracy needed for successful shape reconstruction. In this paper, we address the DFD and registration problem in a unified framework, exploiting their mutual relation to reach a better solution for both cues. We draw a formal connection between registration and defocus blur, find its limitations and reveal the weakness of the standard isolated approaches of registration and depth estimation. The solution is approached by energy minimization. The efficiency of the associated numerical scheme is justified by showing its equivalence to the celebrated Newton-Raphson method and proof of convergence of the emerged linear system. The computationally intensive approach of DFD, newly combined with simultaneous registration, is handled by GPU computing. Experimental results demonstrate the high sensitivity of the recovered shapes to slight errors in registration and validate the superior performance of the suggested approach over two, separately applying registration and DFD alternatives. Index Terms—3D reconstruction, registration, focus sensing, depth from defocus, extended depth of field, GPU computing
1
I NTRODUCTION
A
N essential goal of visual inference is to provide estimates of the scene structure from images. It is well known that extracting shape from a single view is an ill-posed problem. Reliable approaches therefore assume having several views of the scene to recover the shape. Common triangulation methods such as stereo vision [4], [15], [26], structured light [12], or structure from motion [31], assume a pinhole image formation model, ignoring the lens blur. Imaging through shallow depth of field creates a varying blur (defocus) encoding the shape of the scene. A single image is not sufficient for depth estimation due to unknown latent all-in-focus image of the scene. Depth from Defocus (DFD) thus attempts to extract depth from two images (at least), captured at different focus settings [7]–[9], [29], [32], [33]. In DFD, one attempts to solve the inverse problem, i.e., recovering the shape that best agrees with the observations, according to the image formation model. Closely related methods are multi-aperture photography [11] and confocal stereo [14].
1.1 Problem Statement In standard DFD, each point in the scene gains two unique observations. The two observations are then compared in order to estimate the depth at a point. While in aligned images the corresponding points lay under the same coordi•
R. Ben-Ari is with Orbotech Ltd., Yavne 81101, Israel. E-mail:
[email protected].
Manuscript received 22 Nov. 2012; revised 4 Nov. 2013; accepted 27 Nov. 2013. Date of publication 8 Jan. 2014; date of current version 12 May 2014. Recommended for acceptance by R. Yang. For information on obtaining reprints of this article, please send e-mail to:
[email protected], and reference the Digital Object Identifier below. Digital Object Identifier 10.1109/TPAMI.2014.14
nates, when the images are misaligned, the correspondence is lost. DFD requires the scene to have a textured pattern (high frequencies). In most cases hence, a slight misalignment drastically impacts the recovered shape, particularly due to sampling and quantization. Fig. 1 demonstrates the degeneracy of the reconstructed shape under mild misalignments. Serious effort is therefore made to register the input images prior to depth evaluation [8], [32], [33]. The problem of accurate registration in DFD was previously addressed in [32], [33], while the authors discuss alignment level under 0.1 pixel error for images captured in a controlled lab set-up. The alignment problem was also addressed in confocal stereo [14], where an optical model was developed to achieve sub-pixel accuracy. Registration is a well-established domain in computer vision. Common methods today are able to align images even between different modalities (see [35] and the references therein). In an ideal case, the algorithm should be able to detect the same features (corners, curvatures or intensities) in all projections of the scene, regardless of the particular geometric and photometric transformations. However, the registration accuracy is bounded by several factors such as discretization (sampling), photometric transformation, blur or noise [23]. A basic approach defines the registering transformation by best matching the dense appearance pattern between the observations. Other methods match a sparse set of points, called interest points, to better cope with geometric and photometric transformations. A well-known set of interest points is suggested by SIFT (Scale Invariant Feature Transform) [19]. While dense appearance registration can cope with relatively smooth patterns, under low contrast, sparse methods require the existence of unique cues in the image (usually corners), and offer certain invariance with respect to zoom, rotation
c 2014 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. 0162-8828 See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
1042
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 36, NO. 6, JUNE 2014
Fig. 1. Effect of misalignment on 3D reconstruction from DFD. Top: A synthetic case of slanted plane, misaligned by translation of (1,1) pixels. Down: Real images, warped here by translation of (1.8,2.3) pixels to create misalignment. On the left, a sample image (from two) with 3D reconstruction in the middle column. The shape reconstruction from the registered set is shown at right for comparison. Note the distortions associated with rather slight displacements.
and translation. However, the aforementioned approaches rarely cope with defocus blur adequately. Registration of differently blurred images remains a particular challenge. To deal with this problem often focal stack methods abandon the standard registration approaches based on observations and turn to calibration, using specified targets on a controlled device [16]. The problem of registration in the DFD escalates when an asymmetric defocus kernel is involved. In this case, registration features such as edges or corners are smeared directionally, resulting an incorrect alignment. Fig. 2 indicates the difficulty in precise alignment of DFD sets, as sharp image features are matched with their blurred counterparts. Results are demonstrated in Fig. 2 for SIFT-based registration. Current DFD methods separate the registration and depth evaluation stages [8], [9], [32], [33], ignoring the mutual relation between them. On the one hand, knowing the local blur (namely depth) can be utilized to match the observations therefore resulting in improved registration. On the other hand, exact registration is a necessary condition for accurate depth (blur) estimation. This coupled nature of registration and depth, reasons for a combined approach, to drive a unified cost function toward a global minimum. The suggested paradigm demands that both registration and depth be satisfied simultaneously.
1.2 Previous Work DFD is an active area of research with two main approaches: frequency (Fourier) [17], [33], [35] and spatial methods [7], [9]. In the frequency domain one often seeks the latent radiance image1 by deblurring the observations with a discrete range of blur kernels. The scale of the local kernel, indicating depth, is then obtained by minimizing an objective functional that satisfies the image formation model. The need for sufficiently large domains, to estimate local frequencies, as well as the discrete range of depths, 1. The map of a scene’s view where all points are in focus.
often result in errors in regions of varying depths and depth discontinuities. The most popular minimization strategy for continuous modeling is the variational framework [34]. Variational methods in the DFD suggest a spatial approach, allowing for a dense and continuous solution, while providing a well-based mathematical formalism, explicitly expressing the model assumptions. The joint optimization of data compliance and spatial smoothness efficiently fills data values in ambiguous regions, while preserving its discontinuities. In this paper, we formulate our model for registration and depth in a single energy functional. To facilitate the model, avoiding the recovery of the radiance map, we encode the depth by relative blur [8], [9] and assume registration by a similarity transformation. This transformation applies for a rigid motion of the camera/scene, perpendicular to the optical axis, while the scale complies with the optical magnification. Our theoretical analysis shows that regardless of the transformation, one can still relate the observations, modeling the DFD problem by relative blur as in [2], [8]. The suggested numerical solution is in the spirit of [5], [8] and based on linearization of the Euler-Lagrange finite difference scheme, solved by nested (lagged) iterations. We furthermore follow a pyramidal coarse-to-fine approach to better cope with local minima in depth, and large misalignments. However, we suggest a slight modification to the previous approach of [9], to improve its stability. New theoretical observations justify our approach by showing the equivalence of our scheme to the celebrated NewtonRaphson method. Finally, a proof for convergence of the emerged linear system establishes our numerical stability. The registration in DFD plays the same role as weak calibration in stereo-vision, where the fundamental matrix is eventually recovered. In stereo, images are captured in a wide depth of field (therefore approximated by the pinhole model), and are free of (depth varying) blur. These conditions allow typical registration methods, to be suitable for stereo weak calibration. Nonetheless, the DFD problem
BEN-ARI: UNIFIED APPROACH FOR REGISTRATION AND DEPTH IN DEPTH FROM DEFOCUS
1043
Fig. 2. Effect of defocus on registration shown on two pair of images captured/simulated in two different focus positions. Left: A patch from a real image set used for DFD, describing part of a spider. Note that the blurred features are aimed to precisely match their sharp counterpart in registration (and vice versa). Right: A synthetic random dot pattern, uniformly blurred by an asymmetric kernel. On left, the overlaid red circles are SIFT key-points while on right, these points are mapped according to SIFT registration. For comparison, the blue plus signs are our registration results, leading to improved shape recovery as shown in Section 5.
1.3 Parallel Implementation Real-time performance of a 3D reconstruction method is commonly a desired property. Often, a computationally expensive approach prevents a method to spread and become popular. Triangulation methods similar to stereo vision are often the popular depth estimation techniques to offer acceptable accuracy and capability for real-time performance. However methods as stereo vision are not easily applicable in many scenarios. Previously, a real-time DFD system was suggested by Watanabe et al. [32] using an active illumination, to allow for computationally cheap decoding. While introducing a real-time performance (on VGA image size), this approach required an additional projecting system, and was found to be highly sensitive to the object reflectance properties. In this study, we address the problem of passive DFD, where no light emitters are involved, coping with the associated intensive computation by parallel processing on GPU. Our implementation allows computing of 1 MPixel of a dense depth map in just 2 seconds, while conducting self-registration. The computational rate is accelerated by order of magnitude if DFD is activated without registration. These figures are obtained based on a CUDA-C code implementation of our fast converging numerical scheme, when run on a desktop NVIDIA GPU. To the best of our knowledge, this introduces the highest DFD computation rate known in the literature.
the accuracy in both alignment and shape recovery. The suggested approach presents a self-calibrating method2 for estimation of a depth indicator, similar to disparity in stereo-vision. 2) The relative blur model has been acknowledged as a reliable approximation for DFD, resulting in a plausible and stable depth indicator (see also the recent work in [18]). However, the terms under which the relative blur model apply for misaligned images has not yet been shown. In this paper, we formally derive the relative blur model in presence of misalignment, exposing its limitations. This approach can also be used for accurate registration of optically defocused images. 3) New exploration enhances and justifies the stability of previously suggested numerical schemes. In this regard, the convergence of the emerged inner-fixed point iteration scheme is proved. 4) A newly derived parallel scheme for DFD allows an efficient implementation on GPU. To the best of our knowledge, this presents the first real-time and self-calibrating DFD in the literature. Our experimental tests demonstrate the limitation of standard appearance and feature based approaches in accurate registration of images, captured in shallow depth of field. The test-bed includes synthetic as well as real data with quantitative and qualitative comparison. Most of the currently available datasets are indoor images, often captured with controlled illumination in a lab facility. We hereby introduce a new data captured in the wild and provide it for public use. The results show successful recovery of depth, despite the misalignment involved3 . We further compare our method to a dense appearance-based registration and the widely used SIFT matching. Improved results emerging from the suggested approach justify our paradigm.
1.4 Contribution and Experimental Tests Our contribution is reflected in several important aspects as specified below:
1.5 Paper Organization This paper is organized as follows: In Section 2 we begin by describing the image formation model for misaligned
imposes a new challenge for registration due to the shiftvarying blur and the high sensitivity to registration errors (in order of a fraction of a pixel). Combining depth estimation with registration therefore, was previously found to be a practical solution [21]. However, as opposed to our approach, in [21] the blur is assumed to be constant in space, restricting this model to equifocal surfaces, and far from the general case.
1)
Accurate registration of images captured in a shallow depth of field introduces a challenge due to the varying blur. The reciprocal relation between registration and blur is exploited here to improve
2. In terms of weak calibration in stereo-vision. 3. Additional figures and content are provided on the project page, www.cs.bgu.ac.il/~rba/dfd/DFDProjectPage.html to meet length requirements.
1044
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 36, NO. 6, JUNE 2014
data-set. Section 3 then describes the objective functional, unifying registration and blur under a single coupled model. The numerical scheme for energy minimization is then described in Section 4, including modifications for a massively parallel scheme. Section 5, then presents experimental evaluations made on a diverse set of data, with a comparison to two alternatives based on separate registration and depth evaluation. Finally, the paper is concluded in Section 6 with a summary and discussion.
2
I MAGE F ORMATION M ODEL
Let us define the latent all-in-focus view of the scene, namely the radiance image by; r: ⊂ R2 → [0, ∞) with denoting a compact support. We consider two images of the scene captured under the same illumination regime but with different focus setting. Our first observation, I1 : → [0, ∞), is formed by imposing the defocus operator D1 on the radiance image: I1 (x) = D1 z(x) r(x) (1) Note that for a certain camera, the defocus operator depends on the scene’s shape, z: → (0, ∞), and the optical setting, determined here by the subscript. As the optical setting, we refer to the position of the focal plane in the scene, or the distance between the image sensor and the lens in the camera. The new notation, denotes a shift-variant linear operation, to be defined in Section 2.1. We make use of a 3D coordinate axes (X, Y, Z), where Z coincides with the optical axis of the camera and (X, Y) span the perpendicular plane. While previous DFD models assume perfect alignment between the images, we allow for a relative displacement and rotation of the camera/object, prior to the capture of the second image4 . Equifocal and planar motion impose displacement in (X, Y) plane and rotation, around the optical axis (Z). On top of this geometric transformation there is an optical scale, often referred as magnification [27]. The resulting transformation then can be described by a similarity map, P. Note that the scale is constant and not depth dependent, according to the geometric optic model (see proof in Appendix D.1 of [10]). Under these assumptions, the image formation for our second image emerges as: (2) I2 = D2 z(Px) r(Px) While the defocus operator is attached to the camera frame, the shape and the radiance map are both embedded in the object frame.
2.1 Defocus and Geometric Transformation Defocused images are often described with linear models and applied by local convolution. To facilitate the problem, we adopt a parametric approach, assuming the shape of the kernel h, to be known and the depth encapsulated in the kernel’s scale: h:λ × z → [0, 1), with λ ⊂ R2 4. Note that the first image is chosen as the reference image, without loss of generality.
denoting a compact support and z standing for a pointwise depth. This modeling reflects aberration-free optics where point-wise defocus kernel is determined by the corresponding scene’s depth, and is explicitly independent to spatial coordinates5 . Another common assumption is the normalization property: h u, z(x) du = 1, ∀x ∈ (3) λ
reflecting the energy conservation. We now define the defocus operator in (1), by the following linear operation: I1 = D1 (z) r(x): = h1 u, z(x) r(x − u)du (4) λ
Note that the defocus function h1 (·, ·) is determined by the focus setting (indicated by the subscript) and the local depth z. The second image in our DFD set, is captured under a different optical setting and further endowed with a similarity transformation; Px = sRx + t (see Section 2), where s, R, t denote the scale, the rotation matrix and the translation vector respectively. Transformation P relates the second image frame and the scene’s frame, suggesting the following image formation model: (5) I˜2 = λ h2 R−1 u, z(sx + t) r(sx + t − u)du ˜ I2 = I2 (Rx) The image formation model in (5) relates the geometric transformation to the defocus operator. While an aberration-free assumption detaches the kernel from dependency to position, the rotation creates a relative angle that should be considered in the image formation, particularly for asymmetric defocus kernels. This imaging model can be described by first applying the defocus with inversely rotated kernel on a scaled and shifted radiance map, then rotating the resulting image. Equation (5) extends previous image formation models in DFD to misaligned object-camera frames, under similarity transformation. In the following propositions we prove the properties of this model. Proposition 1. Let D be a defocus operator and Px = sx + t a pure translation and optical scale. The defocus and geometric transformation are then commutative, i.e.: (6) D z(Px) r(Px) = I(Px) where I(x) = D z(x) r(x). Note that we omit the optical setting index for readability. Proof. Since R is an identity matrix, our extended image formation model in (5) reduces to: D z(Px) r(Px) = h u, z(sx + t) r(sx + t − u)du (7) λ
Using the change of variables x˜ = sx + t yields: = h u, z(˜x) r(˜x − u)du = I(˜x) = I(Px) λ
(8)
5. In practice, this effect can be compensated for by proper calibration as conducted in [14].
BEN-ARI: UNIFIED APPROACH FOR REGISTRATION AND DEPTH IN DEPTH FROM DEFOCUS
Note that the above claim holds for any defocus kernel, satisfying the normalization and aberration-free properties. Proposition 2. Let the function h(·, ·) be a rotationally symmetric defocus kernel and Px = sRx + t a similarity transformation. The defocus and geometric transformation are then commutative. Proof. Since the kernel is rotationally symmetric: h(R−1 u) = h(u) The intermediate image in (5) is then: ˜I = h u, z(sx + t) r(sx + t − u)du = I(sx + t) λ
(9)
(10)
where the right equality is followed by Proposition 1. Rotating the image I˜ as described in (5) yields the proof: ˜ D z(Px) r(Px) = I(Rx) = I(sRx + t) = I(Px) (11)
2.2 Relative Blur Commonly any defocus model assumes the existence of an underlying radiance map to be the source of the defocused observations. However, in practice, the radiance map is apriori unknown and its recovery involves the unstable and noise sensitive deblur process. It has previously shown [10], that under certain conditions, one can recover the depth map by finding a blur function mapping the observations toward each other, eliminating the necessity for estimation of the radiance. This blur function is known as relativeblur [7], [8]. In this section, we will show that under certain conditions, the relative blur paradigm can be extended to misaligned images. To this end, we start with expressing the radiance map by the reference (untransformed) observed image, according to Eq. (1): r(x) = D1−1 (z) I1 (x)
(12)
Indeed the existence of the inverse defocus operator is a key assumption and will be discussed below. Satisfying Propositions (1) and (2), enables commuting the order of the transformation and defocus in (2). This results in a new representation for our transformed observation involving a new defocus operator independent of the transformation: I2 (P−1 x) = D2 (z) r(x)
I2 (Tx) = D2 (z)D1−1 (z) I1
(14)
Similarly, one can derive the reference image from the transformed image by: I1 = D1 (z)D2−1 (z) I2 (Tx)
according to the shape (and corresponding focus setting) obtaining the radiance map, then applying a new defocus operator to form the second observation. The resulting image is then transformed geometrically to yield I2 . Equations (14), (15), express two options for mapping points between the images, one conducted from the source I1 to I2 (14) and the second vice versa (15). Per-point (pixel), one option involves a stable “positive” blur and the other an unstable “negative” alternative, namely a deblur process. Seeking a stable scheme reasons decomposing the image domain into two disjoint sets, where all points are mapped by “positive” blur. To discuss the existence of operator Di−1 in (14) and (15), let us restrict ourselves to an equifocal patch. The intensity pattern at this domain is then modelled by convolution of the corresponding radiance patch and the defocus kernel. For a pillbox kernel, Di is a non-invertible operator, according to convolution theorem (zero crossing of kernel frequencies). The inverse operator however, exists for a Gaussian kernel. In practice, a Gaussian kernel suffers from high condition number and is therefore sensitive to noise or even rounding errors. This problem is addressed by regularization such as in Weiner filter and more recent approaches of [17], [29]. Interestingly, Subarrao et al. [29] handle the inverse defocus operator by restricting the intensity to behave locally as a third-order polynomial function. The inverse problem can then be formulated explicitly (see also [28]).
2.3 Gaussian Kernel In a parametric approach, we assume the shape of the defocus kernel to be known. A Gaussian shape kernel can be justified as approximation for an ideal lens [21]. Generalization from the local approach to the whole domain allows for replacing the defocus operator in (1) by a Gaussian blur Di (z) = G(Vi ). In this representation, Vi :z(x) → R+ is a strictly monotonic function, with respect to z. Based on self similarity and cascade properties of Gaussian convolution, one can replace the two variance maps, Vi , i ∈ {1, 2} with a single function, obtained from their difference, Vr : = V2 −V1 6 . The operators relating the two observations in (14) and (15) can now be described by the following Gaussians with a single unknown called relative-blur: D2 (z)D−1 1 (z) = G(Vr )
D1 (z)D−1 2 (z)
(13)
Note that a key factor leading to this relation is the commute property proved in Propositions (1) and (2). The two input images can now be related directly, by substituting (12) into (13), changing T = P−1 , for convenience:
(15)
Note that neither of the involved defocus operators depends now on the geometric transformation T. Practically, equation (14) expresses a map between the observations by a two stage process, first defocusing I1
1045
(16)
= G(−Vr )
For the rest of the paper, we will refer to depth as relativeblur, occasionally interchanging these terms . The sign of this newly introduced function indicates the direction of blur, from I1 to I2 or vice versa. This relation creates a matching problem as previously shown in case of aligned images [6]–[8], [29], and is equivalent to the correspondence problem in stereo. Note that if the focus plane for imaging I1 is closer to the camera, then positive Vr indicates points near the camera and negative values, parts further away.
3
T HE E NERGY F UNCTIONAL
The derived matching paradigm can be formulated as an optimization problem. Our new model endows two 6. See Appendix A.3, available online, for elaboration
1046
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 36, NO. 6, JUNE 2014
unknowns, the relative blur function Vr and the registering transformation, T. The optimization problem is often formulated by minimization of an energy functional, consisting of a data fidelity Ed and smoothness prior Es : ˜ r (x), T] ˜ = arg min Ed [Vr , T] + αEs [ ∇Vr ], [V Vr ,T
(17)
where the upper tilde denotes evaluated values and α is a balancing weight. The data term reflects the image formation model (14) and (15). Considering a Gaussian defocus kernel yields (see (16)): G(Vr ) I1 − I2 (Tx) 1 H(Vr ) Ed [Vr , T] =
+ G(−Vr ) I2 (Tx) − I1 1 (1 − H(Vr )) dx (18) Note that the L1 penalization norm for data fidelity introduces robustness to outliers. The Heaviside function H(Vr ) divides the image into two disjoint and complementary sub-domains, according to Vr sign: 1 Vr ≥ 0 H(Vr ) = (19) 0 Vr < 0 As regularization term we employ total variation (TV) to favor piecewise smooth depth maps: ∇Vr 1 dx (20) Es [Vr ] =
L1
norm is used, in both data and regularization terms, f 1 ≈ f 2 + η2 with η as a small constant, avoiding numerical instability. Actually, this parameter has a broader impact and can be used to tune the norm between L1 and L2 [3]. A necessary condition for the minimizer of (18) is given by Euler-Lagrange (EL) equations that yields a pair of coupled PDEs. Variation with respect to Vr results: In practice, a modified
δE = ψ (S1 2 )S1 Iˆ1v H(Vr ) + ψ (S2 2 )S2 Iˆ2v (1 − H(Vr )) δVr (21) −αdiv ψ ( ∇Vr 2 )∇Vr = 0 where ψ(·) denotes the modified L1 norm and the following abbreviations are used to allow a compact representation: S1 : = G(Vr ) I1 − I2 (Tx), S2 : = G(−Vr ) I2 (Tx) − I1 (22)
Iˆ1v : =
δ δVr
[G(Vr ) I1 ] , Iˆ2v : =
δ δVr
[G(−Vr ) I2 (Tx)]
The new terms Iˆ1v and Iˆ2v are functional derivatives with respect to Vr . They endow a high sensitivity, and will be elaborated in the following Section 4. The additional “hat" sign indicates a function after defocus blur operation. The second PDE is emerged from variation with respect to transformation components ζi : δE ∂ = −ψ (S1 2 )S1 I2 (Tx)H(Vr ) (23) δζi ∂ζ i ∂ +ψ (S2 2 )S2 [G(−Vr ) I2 (Tx)] (1 − H(Vr )) = 0 ∂ζi The transformation entries ζi correspond to scale, rotation and shift. For details in derivation of the EL equations, the reader is referred to Appendices A.1 and A.2, available online.
4
T HE N UMERICAL S CHEME
There are several approaches to solving the PDEs in (21) and (23), ranging from explicit gradient-descent to semiimplicit schemes [22], [24], [25]. Recently, semi-implicit methods have shown superior performance in terms of stability and convergence rate. In this paper we follow the semi-implicit approach of [8], [22], based on linearization of the finite difference scheme, but with a few modifications. Discretization of (21), while linearizing the scheme by lagged iteration yields: ψ (S1k 2 )S1k+1 Iˆ1kv H(Vrk ) + ψ (S2k 2 )S2k+1 Iˆ2kv 1 − H(Vrk ) (24) −αdiv ψ ( ∇Vrk 2 )∇Vrk+1 = 0 where the superscript k in the iterations, stands for a known function (obtained from the previous iteration step) and k + 1 indexes the unknown entity. Note that as opposed to previous approaches, the norm derivatives ψ (·) are set with a lagged iteration index to avoid nesting and further computation. The functional derivatives Iˆ1v and Iˆ2v in (24) are often derived analytically and then discretized. However, the emerged analytical expression suffers from numerical instability due to high sensitivity to Vr , particularly at small values (the exponent and the coefficient are inversely proportional to Vr and to cubic of Vr , respectively). We have thus employed the following approximation for computation of Iˆ1v and Iˆ2v : Iˆ (Vr + Vr ) − Iˆ (Vr − Vr ) Iˆv ≈ 2 Vr
(25)
The upper hat indicates a defocused image. In addition to higher stability, this approximation can directly accommodate an implicit and discretely represented defocus kernel (e.g. in case of a coded aperture [17]). The intensive computation burden of multiple defocus operations is coped here by parallel processing. The obtained numerical scheme is still non-linear with respect to Vr . Following [5], [8] we introduce Vrk+1 by small updates around the previous iteration: Vrk+1 = Vrk + dVrk
(26)
Describing the update as a small perturbation allows a firstorder approximation: Sik+1 = Sik + Iˆikv dVrk , i ∈ {1, 2}
(27)
Substitution of the linearized terms in the lagged iteration scheme of (24), yields a linear system for the newly introduced variable dVrk . A system of linear algebraic equations is then emerged after discretization, to be described symbolically by: Ak−1 dVrk = bk−1 .
(28)
In this scheme spatial derivatives are discretized by central differences. In the following sections, we elaborate on the iterative solution of this linear system, revealing its equivalence to the celebrated Newton-Raphson scheme. The remaining unknowns are now the registration parameters. Stabilized cameras endure a relatively small
BEN-ARI: UNIFIED APPROACH FOR REGISTRATION AND DEPTH IN DEPTH FROM DEFOCUS
misalignment between the captured images. It is therefore reasonable to use an explicit scheme for the registration PDE in (23). Image warping associated with the alignment, uses interpolation and thus “blurres" the intensity edges. This blur component is added to the blur caused by the optical system and therefore affects the depth evaluation. To this end, we employed a bi-cubic interpolation to reduce this effect resulting a mild impact on the defocus. To cope with the emerged coupled PDEs we conduct a joint minimization, modifying the relative blur and registration parameters successively at each iteration. A Gaussian pyramid is further applied to handle large misalignments and local minima.
4.1 Relation to Newton-Raphson Iterative Method A popular numerical approach for efficient solution of Euler-Lagrange equations conducts linearization on the finite difference scheme [4], [5], [8]. While this is commonly preferred to cope with large variations, we bring in this section a theoretical justification for the efficiency of such approach. To establish this claim, let us restrict ourselves to the data term, assuming without loss of generality, that Vr > 0. Under these conditions Eq. (28) reads as: dVr = −ψ (S12 )
Iˆ1 − I2 (Tx) Iˆ1v
(29)
Defining the numerator as f : = Iˆ1 − I2 (Tx), yields the following relation: dVr ∼ −
f (Vr ) f (Vr )
(30)
The term ψ (S12 ) is a weight function emerging from our penalization, and would have gained the unit value for a quadratic norm. The updating scheme in (30) is equivalent to the well-known Newton-Raphson method, having at least a quadratic convergence rate7 . This equivalence reflects the efficiency of our semi-implicit scheme.
4.2
Solution of Fixed Point Iterations and Proof of Convergence Eq. (28) presents a sparse linear system for the discrete vector of variables dVr , and can now be approached for a solution. The most relevant tools for this task are the iterative schemes, known as relaxation methods. In this category there are two classes, the stationary methods e.g. Jacobi, Gauss-Seidel (GS) or SOR and non-stationary approaches, e.g. conjugate gradients. Gauss-Seidel offers an efficient iterative approach for large and sparse linear systems. Although GS is inherently sequential, a known relaxation called Red-Black allows for a parallel alternative suitable for GPU computing. In this section, we prove the convergence of the inner fixed-point iterations as described in (28).
1047
matrix entries are obtained from the finite difference scheme in (24). In a column-stacking format, the diagonal terms in (28), am,m can be decomposed into two parts each given by: (31a) am,m = Dm + Sm
2 ˆ 2
2 ˆ 2 Dm : = ψ (S1 )I1v H(Vr ) + ψ (S2 )I2v (1 − H(Vr )) (31b) α
Sm : = 4φm + φm−1 + φm+1 + φm+M + φm−M . (31c) 2 where Dm and Sm are obtained from the data and smoothing terms respectively, and:
−1 −1 φ : = ∇Vr 2 + η2 , ψ (S 2 ) = S 2 + η2 (32) The sum of absolute off-diagonal entries of Ak in the m-th row (Rm ), is practically given by Sm : Rm : =
MN
|am,n | = |Sm | = Sm
(33)
n=1 n=m
where the right equality is due to the fact that φ > 0 (see definition in (32)). All assignments are made for iteration k. To prove the diagonally dominance property of Ak , we first calculate the (signed) distance between the diagonal entry and the sum of off-diagonal terms: Km : = |am,m | − Rm = Dm
(34)
As Sm is cancelled out, the remainder is Dm in (31b). We are now interested in the sign of this term. To this end let us examine the components of Dm as shown in (31b). The first term is positive, ψ (S 2 ) > 0 by definition (see (32)). Assuming a radiance map with high frequency content (a necessary condition for DFD) assures Iˆ1v , Iˆ2v = 0, inferring strict positivity of the corresponding quadratic terms, in (31b). Finally our approximation for the Heaviside function: (35) H (Vr ) = 12 1 + π2 arctan V r , , > 0 yields, 0 < H(Vr ) < 1. Having shown the positive sign for all the components of Dm proves the positivity of this term and therefore of the equal term Km in (34). The coefficient matrix Ak in (28) is hence strictly diagonally dominant. Corollary 1. The fixed point matrix is positive definite.
Lemma 1. The fixed-point coefficient matrix in (28) is strictly diagonally dominant
Proof. According to Gershgorin circle theorem the eigenvalues of Ak must lie within a disc having the radius Rm as derived in (33). The diagonal entries of A satisfy am,m > 0 due to positivity of its components Sm and Dm , as proved in Lemma 1. Since the matrix Ak is strictly diagonally dominant, and all it’s diagonal elements are positive, all the eigenvalues are strictly positive and therefore Ak is positive-definite.
Proof. Let’s consider an image of size M×N. The coefficient matrix in (28) is then a sparse matrix of size MN × MN, having a three diagonal and two off-diagonal terms. The
Corollary 2. The inner Gauss-Seidel iteration in (28) is convergent.
7. For a continuously differentiable function having a non-zero derivative at the solution.
Proof. The convergence properties of the Gauss-Seidel iterative scheme depend on the coefficient matrix Ak . Proven
1048
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 36, NO. 6, JUNE 2014
Fig. 3. Synthetic and simulated data sets. Left to right: Synthetic plane, equifocal surface with assymetric blur, Cup and Cylinders. Designated squares show zoom-in image patches to better visualize the change in blur. Note how changes in optical setting yield different range of blur.
TABLE 1 Misalignments and Registration Errors for Different Approaches
On top, the ground truth: Translation (tx , ty ) in pixels, rotation (θ ) and incremental scale between the images (s). Below: Registration errors for the following approaches: Appearance: The dense appearance based registration, SIFT: Sparse sift-based registration, Our: Our unified DFD and registration method. The resulting depth maps influenced by the registration accuracy are shown in Fig. 4. Dash signs in sift results indicate registration failure due to lack of key points.
the strictly diagonally dominance of Ak in (28), infers convergence of the emerged iterative scheme [1].
4.3 Parallel Implementation The obtained numerical scheme is computationally intensive, particularly due to the recursive defocus operations. Although our defocus operator is shift-variant,8 the locality of the defocus operator allows for a fine grain parallelism. In fact, such a case is typical for many solvers of PDEs [13], [20], [30] and particularly for DFD, as previously shown in [2]. For a real-time operation (on reasonable image sizes), we adapt our numerical scheme to harness the fine parallelism available on GPU. The main challenge for this aspiration was the efficient solution of the sparse linear system in (28). On CPU, such linear systems are often solved by Gauss-Seidel or SOR relaxation methods, with inherently sequential nature. However, the 8. in contrast with the shift-invariant convolution
classic Gauss-Seidel scheme can be relaxed by the RedBlack strategy to allow for a parallel alternative [30]. A single red-black relaxation then consists of two half iterations, each updating every alternate point (called red and black points). The updates for all the red/black points are now parallel, as all the dependencies are the neighboring counter-color pixels from the previous half iteration. Finally, our gradient descent steps introduce an explicit parallel process, similar to the implementation in [2]. Note that the integral (sum) computation in (23) can be efficiently distributed by the reduction process on GPU. Another heavy process to harness GPU computing is the image warp, which involves repetitive interpolations. Our scheme typically converges in less than 20 iterations, over 2-3 levels in the pyramid. The fast converging scheme together with fine grain GPU parallelism introduces rate of 0.5 MP depth values per second on a NVIDIA GeForce GTX 460. This performance ramps up by factor 10 when
BEN-ARI: UNIFIED APPROACH FOR REGISTRATION AND DEPTH IN DEPTH FROM DEFOCUS
1049
Fig. 4. Evaluated depth (relative blur) for the data sets in Fig. 3 (see also Table 1 for misalignments and registration errors). The depth color code is shown at the bottom. (a) Without registration, (b) Appearance-based registration, (c) SIFT based alignment, (d) Our approach, and (e) DFD result for the original aligned data, shown for comparison. From top-down: Synthethic plane, constant defocus with asymmetric kernel, Cups and Cylinders. Note the significant effect of the misalignment in (a). The unified approach outperforms the tested alternatives by showing results that are closest to the depth maps obtained from aligned data. Note the example of asymmetric defocus. While the separate approaches include artifacts, ours yields a constant depth as expected (the slight color difference in the blur level w.r.t the aligned data is in sub-pixel order). Failure in SIFT matching for the Cup set is due to lack of key-points.
the registration mode is off, i.e. DFD is applied on aligned images. In spite of the high throughput there is still room for speed-up by device optimization.
5
E XPERIMENTAL VALIDATION
In this section, we demonstrate the effectiveness of our approach on three types of data sets: synthetic, simulated (where real images were computationally misaligned) and newly introduced real-data captured in the wild. We further demonstrate the shortcomings of standard registration methods particularly in case of asymmetric defocus kernel.
For comparison, we conduct the commonly separated approach, where images are primarily registered and then used for DFD. The performance is then measured by the accuracy of the registration (where applicable) and the recovered shape. To this end, we employ two different registration methods. The first is based on dense appearance matching, formally minimizing the following objective functional: ER [T] = (36) [I1 − I2 (Tx)]2 + η2 dx
We minimize this energy functional by gradient descent to obtain the registration, similar to our unified approach.
Fig. 5. Real data set. From left top right: Antlion, Spider, and Aphrophoridae.
1050
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 36, NO. 6, JUNE 2014
TABLE 2 Misalignments Evaluated for the Real Data, Indicated by Translation (tx , ty ) in Pixels, Rotation (θ ) and Incremental Scale between the Images (s)
Appearance: Registration based on appearance similarity, SIFT: SIFT-based registration, Our: Our unified DFD and registration approach. Note the relatively large scale (over 2%) in the spider case. While SIFT registration agrees with our method in the antlion case, there are discrepancies in the spider and failure in the Aphrophoridae case due to lack of key-points. Inaccuracy in registration yields artifacts in the recovered depth maps. See Fig. 6 and 7 for details.
The second comparison is made with registration based on SIFT [19] key-points, well known by its robustness and invariance to scale and rotation. Registration is then based on fitting a similarity transformation to the corresponding points (found according to their descriptors), using least squares.
5.1 Synthetic and Simulated Images Our synthetic and simulated data consist of four image pairs as shown in Fig. 3. The first synthetic data encodes blur variation of a slanted plane and the second is a demonstration of an equifocal plane, defocused by asymmetric kernel. The asymmetric kernel used was a Gaussian, where the left side (negative x values) was vanished. This simple example demonstrates a limitation of standard and separate registration methods, where the directional blur is commonly perceived as displacement, resulting in registration errors. We further test the methods on simulated real data, where one image is intentionally warped to create a controlled misalignment. To this end, the celebrated Cup pair from [33] and Cylinders from [7] were used. Having the ground truth transformation in hand, we report the registration accuracies in Table 1. Our results show high accuracy in all the test cases with errors below one quarter of a pixel in translation, and up to 0.001◦ and 0.1% rotation and scale, respectively. In separate registration and DFD however, we find significantly higher errors. For instance, in the asymmetric defocus test case (ASK), both comparing methods are drifted as expected, due to the directional blur. While the SIFT registration succeeds in two of four tests, larger errors are obtained in the Cylinders case while lack of key-points in the Cup set9 yields a registration failure (indicated by dashed signs in the table). Errors in registration impact the recovered depth map. Fig. 4, demonstrates how conducting DFD on raw (misaligned) data harms the depth map evaluation. Although prior registration improves this outcome, still leaving a few noticeable artifacts. Inconsistent depth values are particularly present in the asymmetric kernel example, due to the registration errors, as reported in Table 1. Although the SIFT matching outperforms the appearance method, the best results are still obtained by the unified approach, 9. Based on the SIFT code provided by David Lowe at
http://www.cs.ubc.ca/~lowe/keypoints/. We have also tested the OpenCV implementation on all the test cases. It resulted in key-points which in most cases were wrongly matched.
that successfully copes with all the test cases. Note the results for the aligned data (without the added misalignment) shown for comparison in the right column of Fig. 4. In the Cup case, for instance, our depth map is comparable to the aligned data with reported registration accuracy of 0.1 pixel. The conservation of discontinuities in the depth maps, is a consequence of our TV regularization term.
5.2 Real Data We further extend our test bed with new real data, captured in nature10 . The data includes images of tiny creatures scaling in the intermediate domain of microscopic and typical macroscopic field where, to the best of our knowledge, no previous data for DFD was shown. Particularly, the images were captured in the wild from the living antlion, spider and two insects from Aphrophoridae species. The camera used was a semi-professional high-end device, a Canon 5D Mark II. Images were captured while the camera’s color sensor was operated in 21 MP and 14-bit raw mode, then demosaiced in Adobe Camera Raw, transformed to graylevel and down-sampled to 1 MP (for imaging parameters the reader is referred to the project page11 ). Images of our real data set are shown in Fig. 5 (in color). The scenarios include an antlion on a groundsel-shaped plant, a spider on top of a flower and two Aphrophoridaes on a nest, covered by white bubbles. This challenging data set introduces fine structures such as the sepals and antennas on the antlion, the spider’s legs and extremely thin elements like the spider hairs. The focus setting was changed by mounting the camera on a rail and manually moving it between the shots along the optical axis. The captured images endure misalignment due to camera displacement and relative scale caused by the optical magnification. The misalignment components are up to 2 pixels shift, 0.2◦ rotation and 2.3% scale (based on estimation). Table 2 summarizes the evaluated misalignments from our approach and the comparing methods. The recovered depth maps are shown in Figs. 6 and 7. Interesting details in the results are the correct depth order of the antlion’s and spider’s limbs. As Fig. 6 shows, although the misalignment in the raw data causes a corrupted depth map, prior registration indeed improves the results. However, a qualitative examination shows that 10. Courtesy of Ilia Lutsker - Orbotech Ltd. 11. www.cs.bgu.ac.il/~rba/dfd/DFDProjectPage.html
BEN-ARI: UNIFIED APPROACH FOR REGISTRATION AND DEPTH IN DEPTH FROM DEFOCUS
1051
a submillimetre depth resolution. In the Spider case, the flower petals and the spider’s limbs such as the head, legs and the claws in front, are well recovered with their relative depth order. The result in the Aphrophoridae case indicates the nest structure and the existing insects. Comparative examination in Figs. 6 and 7 shows the benefits in our approach. While in the antlion case the appearnce based registration yields artifacts, the SIFT based alterantive and our approach, both show improved results. In the spider test-set however, the unified approach outperforms SIFT registration as inconsistent depths appear in the flower, spider head and fine structures such as the spider legs. Furthermore, due to the relatively smooth pattern in the Aphrophoridae set, the SIFT approach fails (caused by lack of key-points). This demonstrates another case where our appearance and unified framework has an advantage over sparse and separate approaches. We further provide a 3D visualization of our depth maps in Fig. 8. Eminent details are visualized in this representation. Note the 3D structure of the antlion’s double folded wing, the flower petals in the spider image as well as the 3D poses of the spider legs, head and claws. In addition to depth, the relative blur actually labels sharper pixels, by determining the direction of blur between the two observations (see Section 2.3). This characteristic can be further utilized to fuse the input images into a new view with improved sharpness. We call this new image pseudo-radiance12 and compute it by the following fusion equation: r˜(x) = I1 (x)H(Vr ) + I2 (Tx) (1 − H(Vr )) ,
(37)
where the Heaviside function provides the (soft) labeling. The resulting pseudo-radiance images obtained from our real data are shown in Fig. 9. Note how these images are fused seamlessly, to suggest a sharper view. Close-ups describe relevant parts in the input images and their corresponding sharper variant fused into the pseudo-radiance map. This is an outcome of our reliable depth ordering. On the project page13 , we also provide 3D animations created by rendering our recovered shapes with the pseudoradiance image. Fig. 6. Depth maps (relative blur) for the real data set. (a) Without registration, (b) Appearance-based registration, (c) SIFT matching, and (d) Our approach. From top-down: Antlion, Spider, and Aphrophoridae. Columns (a) and (b) are dominated by artifacts. The SIFT matching fails in the Aphrophoridae case (third row) due to lack of key points. Color coding, presents red as near and blue as far from the camera. Note that the background obtains intermediate zero blur due to lack of texture.
while comparative results are obtained with SIFT matching for the antlion, overall, the best results are still achieved in the unified framework. In our depth map for the antlion the distance order of the limbs, particularly, wings, antennas and legs are correctly recovered. The depth resolution is sufficient to expose the slanted angle of the insect’s wing. More details are shown in Fig. 7. Interestingly, the results reveal the existence of a second wing underneath the front one, hardly observable from the images (see lower row of antlion’s close-up in Fig. 7). This observation indicates
6
S UMMARY AND D ISCUSSION
Depth from Defocus (DFD) exploits optic blur to infer the shape of the scene from just two images. A strong requirement for this approach is the accurate alignment between the input images, while a slight misalignment causes significant error in the recovered depth map. Previous DFD methods perform registration between the images as a preprocessing stage. However, as the depth of field is decreased to allow for higher depth resolution, the blur dominates, deteriorating the capability of standard approaches to accurately register the inputs. While robust registration methods such as SIFT point matching, show invariance to rotation and scale and partially to constant blur, they commonly lack of capability to accurately register images under shift-variant and particularly asymmetric blur. 12. In contrast with the radiance where all image parts are in focus 13. http://www.cs.bgu.ac.il/∼rba/dfd/DFDProjectPage.html
1052
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 36, NO. 6, JUNE 2014
Fig. 7. Close ups on depth maps (relative blur). These close ups show details and existing artifacts as obtained from different approaches. (a) Without registration. (b) Appearance based registration. (c) SIFT matching. (d) Our approach. First row-Antlion: Outliers are observed in the head as well as the legs. At the front wing, one can distinguish the secondary wing folded, a delicate detail hardly observed in the image. Second row-Spider: Artifacts are observed on the flower and the leg. Note that Spider’s leg in the result is correctly recovered, only in our approach. Third row-Aphrophoridae: Observed inconsistencies are marked by arrows, as sharp depth variations. The SIFT matching is failed here due to lack of key-points.
Based on the coupled nature of registration and blur we propose a unified model for depth from defocus. To this end, we formally derive the image formation model and the corresponding relative-blur paradigm in presence of misalignment. Assuming a rigid in-plane motion of the object/camera, the observed images relate by a similarity transformation (in addition to optic blur). Accordingly, the intuitive registration conducted by warping one image
toward the other is justified for a rotationally symmetric defocus kernel. However, in case of an arbitrary kernel this relation is satisfied only for pure translation of the object. The suggested framework accommodates both registration and depth (as relative blur) in a single unified objective functional. The solution is then approached by a numerical scheme based on fixed point iterations and gradient descent. We further show the equivalence of
BEN-ARI: UNIFIED APPROACH FOR REGISTRATION AND DEPTH IN DEPTH FROM DEFOCUS
1053
Fig. 8. Color-coded 3D visualization for real data set (please refer to the electronic copy or the project page for better visibility).
the derived numerical scheme to the celebrated NewtonRaphson method and prove the convergence of the iterative scheme associated with our linear system. This reflects the stability of the derived numerical scheme. A pyramidal
approach (up to 2-3 levels) allows recovery of large misalignments and copes with local minima. Experimental evaluations cover synthetic data as well as real images from literature and newly introduced
Fig. 9. Synthesized best focus image (pseudo-radiance image). (a) Near focus. (b) Far focus. (c) Best focus image. Note while parts appear in and out of focus in the input, our pseudo-radiance image includes the sharper choice.
1054
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 36, NO. 6, JUNE 2014
photographs captured in wild. The tests show successful registration and recovery of the target shape. To emphasis the capabilities of the suggested approach, experimental evaluations were made for two alternatives conducting separate registration and DFD. The outcome shows that overall, our combined approach outperforms the comparing methods for both registration and depth recovery. DFD is recognized as a computationally intensive method for 3D reconstruction. In this paper, we handle the computational burden by GPU computing. A new numerical scheme harnessing the fine parallelism of GPU computing yields runtime of 2 sec for 1 MP image. The computational bottle neck is the registration explicit scheme as the throughput ramps up over 10 folds if DFD is activated without it. Future work can move in various directions. In theoretical aspects, there is room for analysis of DFD sensitivity with respect to registration. The impact of the interpolation in image warping inspires another direction, as this approximation imposes a blur effect undesirably merged with the optical blur. For further speed up, we intend to focus on more efficient schemes to recover the registration in our model. In an era where multi-core devices become ubiquitous, the suggested approach presents a practical alternative for inferring a realistic depth map that can further be used for focus manipulation, object detection and segmentation.
ACKNOWLEDGMENTS I would like to thank Dr. S. Rippa for sharing with me his vast knowledge in numerical algebra, my college, G. Raveh for his devoted assistance in GPU implementation and last but not least, I. Lutsker for providing me his amazing photographs. I further thank the anonymous reviewers for their thorough review and constructive remarks which made a major improvement to this manuscript.
R EFERENCES [1] R. Bagnara, “A unified proof for the convergence of Jacobi and Gauss-Seidel methods,” SIAM Rev., vol. 37, no. 1, pp. 93–97, Mar. 1995. [2] R. Ben-Ari and G. Raveh, “Variational depth from defocus in real-time,” in Proc. IEEE Int. Conf. ICCV, Barcelona, Spain, 2011, pp. 522–529. [3] R. Ben-Ari and N. Sochen, “A geometric framework and a new criterion in optical flow modeling,” J. Math. Imaging Vision, vol. 33, no. 2, pp. 178–194, 2009. [4] R. Ben-Ari and N. Sochen, “Stereo matching with Mumford-Shah regularization and occlusion handling,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 32, no. 11, pp. 2071–2084, Nov. 2011. [5] T. Brox, A. Bruhn, N. Papenberg, and J. Weickert, “High accuracy optical flow estimation based on a theory for warping,” in Proc. 8th ECCV, Prague, Czech Republic, 2004, pp. 25–36, LNCS 3024. [6] J. Ens and P. Lawrence, “An investigation of methods for determining depth from focus,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 15, no. 2, pp. 97–108, Feb. 1993. [7] P. Favaro, “Shape from defocus via diffusion,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 30, no. 3, pp. 518–531, Mar. 2008. [8] P. Favaro, “Recovering thin structures via nonlocal-means regularization with application to depth from defocus,” in Proc. IEEE Conf. CVPR, San Francisco, CA, USA, 2010. [9] P. Favaro and S. Soatto, “A geometric approach to shape from defocus,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 27, no. 3, pp. 1–12, Mar. 2005.
[10] P. Favaro and S. Soatto, 3-D Shape Estimation and Image RestorationExploiting Defocus and Motion Blur. London, U.K.: Springer-Verlag, 2007. [11] P. Green, W. Sun, W. Matusik, and F. Durand, “Multi-aperture photography,” in Proc. ACM SIGGRAPH, New York, NY, USA, 2007. [12] M. Gupta, A. Agrawal, A. Veeraraghavan, and S. G. Narasimhan, “Structured light 3D scanning in the presence of global illumination,” in Proc. IEEE Conf. CVPR, Providence, RI, USA, 2011, pp. 713–720. [13] P. Gwosdek, H. Zimmer, S. Grewenig, A. Bruhn, and J. Weickert, “A highly efficient GPU implementation for variational optic flow based on the Euler-Lagrange framework,” in Proc. 3rd ECCV Workshop CVGPU, Heraklion, Greece, 2010. [14] S. W. Hasinoff and K. N. Kutulakos, “Confocal stereo,” Int. J. Comput. Vision, vol. 81, no. 1, pp. 82–104, 2008. [15] Y. S. Heo, M. Lee, and S. U. Lee, “Robust stereo matching using adaptive normalized cross-correlation,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 33, no. 4, pp. 807–822, Apr. 2011. [16] A. Kumar and N. Ahuja, “A generative focus measure with application to omnifocus imaging,” in Proc. Int. Conf. Computational Photography, Cambridge, MA, USA, 2013, pp. 112–119. [17] A. Levin, R. Fergus, F. Durand, and W. T. Freeman, “Image and depth from a conventional camera with a coded aperture,” in Proc. SIGGRAPH, New York, NY, USA, 2007. [18] X. Lin, J. Suo, G. Wetzstein, Q. Dai, and R. Raskar, “Coded focal stack photography,” in Proc. ICCP, Cambridge, MA, USA, 2013. [19] D. G. Lowe, “Distinctive image features from scale-invariant keypoints,” Int. J. Comput. Vision, vol. 60, no. 2, pp. 91–110, 2004. [20] J. Mairal, R. Keriven, and A. Chariot, “Fast and efficient dense variational stereo on GPU,” in Proc. 3DPVT, Chapel Hill, NC, USA, 2006. [21] Z. Myles and N. da Vitoria Lobo, “Recovering affine motion and defocus blur simultaneuosly,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 20, no. 6, pp. 652–658, Jun. 1998. [22] N. Papenberg, A. Bruhn, T. Brox, S. Didas, and J. Weickert, “Highly accurate optic flow computation with theoretically justified warpings,” Int. J. Comput. Vision, vol. 67, no. 2, pp. 141–158, 2006. [23] D. Robinson and P. Milanfar, “Fundamental performance limits in image registration,” IEEE Trans. Image Process., vol. 13, no. 9, pp. 1185–1199, Sept. 2004. [24] G. Rosman, L. Dascal, A. Sidi, and R. Kimmel, “Efficient Beltrami image filtering via vector extrapolation methods,” SIAM J. Imaging Sci., vol. 2, no. 3, pp. 858–878, 2009. [25] G. Rosman, L. Dascal, X. C. Ta, and R. Kimmel, “On semi-implicit splitting schemes for the Beltrami color image filtering,” J. Math. Imaging Vision, vol. 40, no. 2, pp. 199–213, 2011. [26] N. Sabater, A. Almansa, and J. M. Morel, “Meaningful matches in stereo vision,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 34, no. 5, pp. 930–942, May 2012. [27] W. J. Smith, Modern Optical Engineering. 3rd edn. McGraw-Hill, 2000. [28] P. Soon-Yong, “An image-based calibration technique of spatial domain depth-from-defocus,” Pattern Recognit. Lett., vol. 27, no. 12, pp. 1318–1324, 2006. [29] M. Subbarao and G. Surya, “Depth from defocus: A spatial domain approach,” Int. J. Comput. Vision, vol. 13, no. 3, pp. 271–294, 1994. [30] N. Sundaram, T. Brox, and K. Keutzer, “Dense point trajectories by GPU-accelerated large displacement optical flow,” in Proc. ECCV, Heraklion, Greece, 2010, pp. 438–45. [31] R. Szeliski, “Recovering 3D shape and motion from image streams using nonlinear least squares,” in Proc. CVPR, New York, NY, USA, 1993. [32] M. Watanabe, S. Nayar, and M. Noguchim, “Real-time computation of depth from defocus,” in Proc. Int. Soc. Optical Engineering SPIE, vol. 2599. 1996, pp. 14–25. [33] M. Watanabe and S. K. Nayar, “Rational filters for passive depth from defocus,” Int. J. Comput. Vision, vol. 27, no. 3, pp. 203–225, 1998. [34] J. Weickert, Anisotropic Diffusion in Image Processing. Stuttgart, Germany: Teubner, 1998. [35] C. Zhou, S. Lin, and S. K. Nayar, “Coded aperture pairs for depth from defocus,” in Proc. ICCV, Kyoto, Japan, 2009.
BEN-ARI: UNIFIED APPROACH FOR REGISTRATION AND DEPTH IN DEPTH FROM DEFOCUS
Rami Ben-Ari received the B.Sc. and M.Sc. degrees in aerospace engineering from the Israel Institute of Technology, Technion, Israel, and the Ph.D. degree in applied mathematics from Tel Aviv University, in 2008. From 2008 to 2009, he was an associate researcher at Technion and Ben-Gurion University. In 2009, he joined Orbotech Ltd., the world leader in automated optical inspection of electronics manufacturing, as an algorithm researcher and later on entitled as an expert in computer vision (in the professional path ladder). Currently, he is part of eyesight-Technologies, as a Senior Algorithm Researcher working on visual tracking and gesture recognition. His current research interests include computational photography, PDE methods in computer vision and machine learning, with mission to bring the state-of-the-art academic research into industrial applications. For more information on this or any other computing topic, please visit our Digital Library at www.computer.org/publications/dlib.
1055