Online Model-Based Multi-Scale Pose Estimation - CiteSeerX

17th Computer Vision Winter Workshop ˇ Matej Kristan, Rok Mandeljc, Luka Cehovin (eds.) Mala Nedelja, Slovenia, February 1-3, 2012

Online Model-Based Multi-Scale Pose Estimation Thomas Kempter, Andreas Wendel, and Horst Bischof [email protected], {wendel, bischof}@icg.tugraz.at

Institute for Computer Graphics and Vision Graz University of Technology, Austria

Abstract. We present a novel model- and keypointbased pose estimation approach which extends stateof-the-art visual Simultaneous Localization And Mapping (SLAM) by using prior knowledge about the geometry of a single object in the scene. Using keypoint-based localization, our object of interest is generally tracked by its surroundings. Additionally, we incorporate the knowledge about the object’s geometry to initialize the SLAM algorithm metrically, we use the model to refine the object’s pose to increase accuracy, and finally we use a purely modelbased tracking component to overcome loss of tracking when the keypoint-based approach lacks distinctive features. Experiments show an improvement of the mean translational error from 6.1 cm to 1.7 cm for solid objects, and from 6.9 cm to 2.6 cm for wiry objects. Furthermore, by the use of model-based tracking we reduce the number of pose estimation failures by more than 20%.

1. Introduction The field of robotics has been constantly growing over the past decades. Along with this development, computer vision and its applications have become increasingly important. The data acquired by vision systems can be applied to a variety of applications such as visual navigation, grasping of objects, or quality inspection. Our work is motivated by the task of autonomous overhead power line inspection by an Unmanned Aerial Vehicle (UAV ), as not only costs can be cut but furthermore the risks for the crew performing the inspection can be reduced to a minimum. Therefore, the recognition of a power pylon as well as estimating the camera pose relative to such a fixed, rigid structure is essential. Initial experiments have shown that it is difficult to estimate a pose relative

to complex structures like power pylons from single images. We therefore focus on a tracking approach which exploits discriminative features in the background of the object. Thus we can track complex structures and even determine a pose when either parts or even the entire object is occluded. Additionally, a wireframe Computer Aided Design (CAD ) model of the object is used for metrical initialization as well as in a refinement step, where the pose estimate delivered by keypoint-based tracking is enhanced. This helps to improve accuracy especially when being close to the object. Furthermore, when keypoint-based tracking fails due to a lack of distinctive features, purely model-based tracking takes over. This allows to avoid tracking failures until revisiting explored areas, where we seamlessly switch back to keypoint-based tracking.

2. Related Work The problem of determining the position and orientation of a camera is known as pose estimation. When the camera has been calibrated beforehand, six degrees of freedom (DOF ) remain; three DOF for the rotation matrix R and another three DOF for the translation vector t. As this topic is almost as old as the field of computer vision it is very extensively studied, and three different groups of algorithms can be distinguished by means of their approach to determine the pose. For some time view- or appearance-based approaches [2, 4, 5, 20], especially eigenspace approaches, were popular due to their ability to handle the combined effects of shape, pose, reflections, partial occlusions, and illumination changes. These methods compare the query image to precomputed views of the object to determine its pose. The major drawback of such approaches is that they try to deal with the full six-dimensional search space by cluster-

ing the precomputed 2D-views of the object. Furthermore, due to comparing with precomputed views, these approaches are not suitable for non-solid objects such as power pylons. Another class are descriptor-based methods [1, 14, 21], where artificial views of the object are generated and descriptors are extracted at salient image locations. Based on these descriptors, a classifier is trained. During the query phase, descriptors are extracted again at salient image locations, which are then classified accordingly in order to determine the object’s pose. Despite their good performance in some applications, such approaches are limited to textured objects. As we are looking for a more general approach to determine the pose of objects with little or no texture, these methods are not suitable either. Finally, correspondence-based approaches [3, 6, 9, 13, 22] use either points, lines, or other geometric primitives like triangles, and therefore circumvent the search in the large pose space. However, they have to deal with the search space resulting from establishing point correspondences if no prior knowledge is given. This search space can even grow bigger when additional features are introduced by clutter. However, this can be avoided by choosing more complex features such as lines, which reduces the search space on the one hand but makes the algorithm less robust to e.g. partial occlusions of the object. Nevertheless, only correspondence-based approaches are suitable for wiry objects without any associated texture. When 3D-to-2D point correspondences are not given, the pose has to be estimated while simultaneously hypothesizing correspondences. Typically, RANdom SAmple Consensus (RANSAC ) [13] style approaches are applied successfully to such problems. However, they are not suitable for time critical applications either, as the number of iterations needed to obtain satisfactory results is quickly increasing with the number of 3D-model and detected 2D-image points. Therefore, and due to the fact that we need to estimate the pose not only for single images but for a continuous sequence, SLAM approaches are most suitable. There, the camera just has to undergo small changes from one frame to the next. A popular real-time Visual SLAM (VSLAM ) implementation is that of Davison [7]. He was the first to propose a real-time capable, single-camera localization by creating a map of sparse natural features.

The Bayesian framework allows handling of uncertainties in a way that robust localization is possible. Based on this approach Davison et al. proposed MonoSLAM [8]. They extend the previously proposed method of monocular feature initialization by a feature orientation estimation. This successfully increases the range in which landmarks are detected and hence improves tracking. However, MonoSLAM only works for slow camera motions or in combination with further sensor inputs. Bleser et al. [3] proposed a robust real-time camera pose estimation approach for partially known scenes. The connection between the real and the virtual world is made by one known object, given as a CAD -model. However, the model is only used during initialization; afterwards point-based tracking is applied to estimate the pose using warped patches around extracted feature locations. Others use model-based pose estimation [17, 23] together with gyroscopic measurements to be able to cope with fast motions and the resulting motion blur. While [17] is used in Augmented Reality (AR ) for head-mounted displays, [23] is used for outdoor AR and extends the former work of Klein by the usage of a texture-based model and online edge extraction. Both approaches use the edge-based tracking algorithm by Drummond and Cipolla [10]. The work of Kyrki et al. [19] is most closely related to our work, as they investigated the tracking of rigid objects using the integration of modelbased and model-free cues in an iterated Extended Kalman Filter (EKF ). The model-based cue uses a wireframe-edge model of the object, whereas, the model-free cue uses automatically generated surface texture features. Thus, the approach relies on both edge and texture information. The model-based tracking is similar to [10, 11], who estimate the normal flow for points along edges. The normal flow is then projected in the direction of the contour normal so that the error can be minimized using robust M-Estimators [15]. For the model-free cue, Harris corners are extracted and tracked for visible parts of the object by minimizing the Sum of Squared Differences (SSD ) in RGB values. The algorithm is able to deal not only with polyhedral objects, but also with spherical, cylindrical and conical objects. However, in contrast to our work this approach only makes use of texture information available on the object, and hence is not capable of dealing with non-solid objects such as power pylons.

In this work, we employ a state-of-the-art VSLAM system named Parallel Tracking And Mapping (PTAM ) [18]. PTAM is able to estimate the camera pose in a formerly unknown scene. It splits tracking and mapping into two separate tasks, which are processed in parallel. Thus, global optimization techniques can be applied, resulting in an accurate and real-time capable tracking system. PTAM is even capable of dealing with reasonably fast camera motions without being given further sensor measurements. As PTAM is on the one hand real-time capable, and on the other hand able to deal with camera transformations over multiple scales by applying image pyramids, it is perfectly suited for our needs of online model-based multi-scale pose estimation. Our work therefore extends this keypoint-based approach to the use of model-based information as discussed in the remainder of this work.

3. Our Approach We extend the VSLAM framework of Klein et al. [18] by several model-based components: First, we use the CAD -model of the object to generate a metrically correct map during initialization. Hence, a metrically correct pose is determined in every frame and can be fused with other systems such as the Global Positioning System (GPS ). During initialization some user interaction is required, however, we thus avoid object-dependent pose ambiguities, i.e. multiple poses for which the object appears the same, right from the beginning. After having initialized the keypoint-based tracking, the object is actually only tracked by its surroundings. This has on the one hand the advantage that the pose can be estimated over multiple scales, and that the pose estimation is robust to partial occlusions of the object. Additionally, pose estimation is even possible when the object completely vanishes, which is not possible using pose estimation approaches based solely on the object. On the other hand, given a good prior pose estimate from keypoint-based tracking, we can use the knowledge about the object’s geometry to refine the pose, which is our second extension. Keypoint-based pose estimation works well when enough discriminative world information is detected in the current image. Finally, when thinking for instance of inspection tasks, closeup views of the object are sometimes necessary. When being very close to an object, keypointbased tracking often fails; however, the model is

very prominent in such cases. As a third extension, we therefore impose a purely model-based tracking component. Model-based tracking tries to close the gap until revisiting already explored areas, where we can seamlessly switch back to keypoint-based tracking. An overview of the entire system is given in Figure 1.

3.1. Metric Initialization Detailed information about a single object in the scene is given in form of a CAD -model, which is defined using quadrangles rather than triangles. This information is now used to initialize the VSLAM algorithm metrically. To determine the first camera pose C1 , the user has to choose n correspondences between 3D-model points and 2D-image points. Then, a Perspective n Point problem (PnP ) is solved using Zhang’s method [26]. While the user has to translate the camera sideways, salient image points are tracked. From these correspondences we estimate a homography between C1 and C2 which is further decomposed into a rotation R and a relative translation vector ˆt using [12]. Finally, the user has to choose another previously selected model point. Now that two image locations of one and the same point, namely xC1 and xC2 , and the relative camera translation ˆt and rotation R between the two views ˆ 2 are known, the corresponding 3D-model C1 and C ˆ C can be triangulated in the camera coorpoint X 1 dinate frame of camera C1 . After the 3D-modelpoint X has been transformed to the same coordinate frame, the disparity in length between the true XC1 ˆ C can be and the reconstructed 3D-model-point X 1 calculated. The scale factor s=

||XC1 || ˆ C || ||X

(1)

1

can be derived using similar triangles, where || · || depicts the length of the vector. The factor is then used to scale the baseline to its correct metric length, yielding the true metric translation vector t = sˆt. This procedure is depicted in Figure 2. As the true baseline is finally available, a metrically correct map is generated using triangulation of pairwise corresponding image features.

3.2. Model-Based Refinement and Tracking We distinguish between model-based refinement which is applied when keypoint-based tracking is good, and purely model-based tracking in all other

Figure 1. System overview. After metric initialization using the model the object is tracked by its surroundings and refined using model edges. When this approach fails, a model-based tracking takes over. It seamlessly switches back to background-based tracking when visiting already explored areas. The gray blocks depict the extensions compared to PTAM [18].

Figure 2. C1 is determined by solving a PnP -problem. R and the relative translation ˆt are extracted from an estimated homography, while the correct location of C2 is determined by rescaling the baseline. The scaling factor is determined from the disparity in length between a reˆ and the true model-point X. constructed X

cases. Both methods work the same, but during model-based tracking the prior pose estimate is not delivered by keypoint-based tracking but results from the refined pose of the last frame. However, in both cases the same steps are necessary. First, all model points are projected to the image and a bounding box is calculated. A refinement is only triggered if the ratio of the areas A bounding box >t A image

(2)

exceeds a certain threshold t. Otherwise, the object is either not present in the image anyways, or so small that calculating a refinement based on the object’s model would not improve the camera’s pose notably. As the refinement is based on gradient information, the current image is convolved with the derivative of a Gaussian kernel Gx (x, y, σc ) in x-direction and

Gy (x, y, σc ) in y-direction, giving the image derivatives Ix and Iy , respectively. The previously calculated bounding box is increased by a certain percentage, and the gradient magnitude |∇I(x, y)| as well as the gradient direction ψ(x, y) are extracted for this region of interest only. Then, additional model points need to be interpolated on the edges connecting a pair of model points and their visibility is determined by the use of a rendering component. Therefore, the CAD -model is rendered under its currently estimated pose E∗CW and the z-buffer is retrieved. Then the 3D-world points XW are transformed to the camera coordinate frame via XC = E∗CW XW . (3) Then, the 3D-point XC is projected from the camera coordinate frame to normalized clipping coordinates (xncc , yncc , zncc )T using the OpenGL camera projection matrix which has been applied during rendering, and further linearly transformed to image coordinates (uncc , vncc )T . The z-buffer is finally evaluated at these coordinates to retrieve the correct depth value for the 3D-point. If the difference |zncc − zbuf f er (uncc , vncc ) | < τ,

(4)

where zbuf f er (uncc , vncc ) is the corresponding zbuffer value and τ a scalar threshold, the 3Dpoint XW is considered visible, otherwise invisible. The technique we use to refine our pose is based on an edge-based tracking algorithm proposed by Drummond and Cipolla [10], which has already been successfully applied to several augmented reality applications [17, 23]. For all model points including the interpolated ones, a linear search for the strongest gradient magnitude is performed as depicted in Figure 3. The search line η is arranged perpendicular to

projected model points

object in the image

perpendicular search line gradient direction image gradient

Figure 3. For each projected model point, a linear search perpendicular to the model’s edge is performed. The image point is thereby shifted along the search line η, while looking for a strong image gradient with approximately the same direction. The gradient magnitude is weighted with a Gaussian windowing function centered at µ, where σ 2 controls the expansion of the search region.

the model’s edge. The projected model point is now shifted along the perpendicular search line to the image point (x, y) which maximizes 2 2 − v −µ 2 2σ max |∇I(x, y)|e − λ ^ (ψ(x, y), η) , (x,y)

(5) where the image gradient magnitude |∇I(x, y)| is weighted with a Gaussian windowing function centered at the projected model point’s position µ, the scalar v denotes the distance along the search line, and λ ^(n1 , n2 ) is a term that penalizes orientation deviations between the perpendicular search direction η and the image gradient direction ψ(x, y). In short, what we are looking for is the nearest image location of the projected model point with the strongest gradient response and approximately the same direction. The image distances between old and new positions are assigned to an error vector e. We then calculate a motion vector µ which minimizes these image errors by solving the equation system Jµ = e ,

(6)

where J is a Jacobian matrix describing the effect of each element of µ onto each element of e in direction perpendicular to the edge. The elements of J are therefore calculated by taking the partial derivatives at the current pose along the edge normal as ! ∂u ∂ei ∂µj Jij = = hnˆi , ∂v i , (7) ∂µj ∂µj where (u, v)T are the image coordinates of the ith projected model point, nˆi is the corresponding unit length edge normal, and hx, yi denotes the tensor dot product. A world point piW is projected to the image coordinates (u, v)T by first transforming it to the

camera coordinate frame using the actual pose estimate E∗CW and then performing the camera projection as ui = CamProj (E∗CW piW ) vi (8)   X iC iC fx 0 px  ZYiC  =  ZiC  . 0 fy py 1 The radial lens distortion can be ignored here, as the images are undistorted prior to applying our algorithm. Thus, Equation 8 has to be differentiated with respect to the motion parameter vector µ by applying the chain rule. The camera’s pose E∗CW is represented by a 4 × 4 matrix of the Special Euclidean Lie Group SE(3) [25] which may be parameterized by a six-dimensional motion vector µ via the exponential map, where µ1 , µ2 and µ3 represent translation along x, y and z axes, and µ4 , µ5 and µ6 describe the rotation around these axes, respectively. An SE(3) matrix E is calculated from a six-dimensional motion vector µ as ! 6 X E = exp µi Gi , (9) i=1

where Gi are called generator matrices. All operations on the SE(3) group are continuous and smooth, and hence differentiable in closed form. Differentiating a motion matrix E at the origin µ = 0, the partial derivatives ∂E = Gi (10) ∂µi are simply the generator matrices. Given all necessary mathematical expressions, Equation 6 may be solved for the motion-update vector µ using a simple least-squares solution µ = J† e , −1

(11)

with J† = JT J JT being the pseudo-inverse. However, as commonly known, least-squares solutions are not robust to outliers. As in this case gross outliers are very likely to occur due to noise, occlusions or falsely matched points during the perpendicular search, solving the above equation system in a least-squares manner is not appropriate. Therefore, a robust estimation for µ is performed using Iteratively Reweighted Least-Squares (IRLS ) by estimating X µ0 = argmin wTUK (||ei ||) ||ei ||2 (12)

µ

i

Figure 4. The left image shows the result of the first iteration, while the right one depicts the last iteration and thus the final result. The pose of the prior iteration is shown in yellow, the refined pose in green, and the error vectors are depicted by white lines. Best viewed in color.

in each iteration. The weights w are obtained using a Tukey M-Estimator [16]. Thus, the prior pose E∗CW is updated as ECW = exp µ0 E∗CW . (13) An example for a refinement using ten iterations is given in Figure 4.

4. Experiments and Results Several experiments with solid objects as well as wiry objects have been performed. Our implementation runs in real time with 15-20 frames per second (FPS ), while exploiting the computational resources of todays multi-core processors. We have evaluated our adoptions in terms of tracking accuracy by comparison to ground-truth, acquired by an outsidein tracking system. All evaluation tasks are performed using an Intelr CoreTM i5 2,66 GHz processor. While for the translational error the Euclidean distance terr = ||t − ttrue || (14) between the two camera centers is used, an attitudinal error measure based on a quaternion representation has been chosen. A quaternion is defined as q = T T T ρ qw , with ρ = qx qy qz = ˆ e sin 2θ and qw = cos 2θ , where ˆ e is the axis of rotation and θ is the angle of rotation [24]. Thus, the angular difference between two quaternions qtrue and q can be calculated from qw as θerr = 2 arccos ||q−1 , (15) true q||w where || · ||w is an operation that first normalizes the quaternion resulting from the quaternion multiplication, and then extracts qw . The parameters have been fixed to t = 0.2, σc = 0.7, λ = 0.8, τ = 0.025, and an enlargement of the object bounding box by 30% for all experiments.

Evaluation results for a solid object are presented in Table 1, those regarding a wiry object are listed in Table 2 and depicted in Figure 5. The results show that our algorithm clearly outperforms standard PTAM . While this is also due to model-based refinement during tracking, the major improvement is caused by refinement during initialization. In standard PTAM , relative motion estimation between the first and second keyframe estimated from the piecewise planar environment using [12] is not robust. However, when the model information is incorporated into the initialization process to refine the camera poses for the first and the second keyframe, respectively, then standard PTAM is almost as accurate as our implementation. The accuracy gained when constantly refining the pose during tracking using the model is quantitatively significant, but is only essential when being very close to the object in our indoor experiments. However, outdoors the pose estimate provided by PTAM is nosier and therefore modelbased refinement is beneficial. Qualitative results of our implementation for different wiry objects are shown in Figure 6 and 7.

PTAM Ours

transl. error [cm] µ ±σ max 6.1 3.4 17.7 1.7 0.8 6.4

rot. µ 1.2 1.0

error [deg] ±σ max 0.4 3.7 0.4 3.2

Table 1. The accuracy of the standard PTAM implementation is compared against our implementation with modelbased refinement for a solid object.

PTAM Ours

transl. error [cm] µ ±σ max 6.9 2.9 32.0 2.6 3.1 31.3

rot. µ 1.4 1.6

error [deg] ±σ max 1.3 11.3 1.3 11.7

Table 2. The accuracy of the standard PTAM implementation is compared against our implementation with modelbased refinement for a wiry object.

The main benefit of our implementation is that the combination of the keypoint-based tracking procedure with model-based tracking can keep tracking in cases where standard PTAM would lose track. This is especially important when capturing closeup views of parts of the object as it is often necessary during inspection tasks. We have simulated such closeup views using the wiry wooden box, and compared the standard PTAM implementation against our implementation. As can be seen from the results summarized in Table 3, loss of tracking could always be pre-

Figure 6. Qualitative results of our implementation for the wiry object used during evaluation. The pose is estimated from keypoint-based tracking and then refined using the available information about the object’s geometry.

Figure 7. Qualitative results of our implementation for an outdoor scene with a wooden model of a power pylon.

5. Conclusion

Figure 5. Comparison of the tracked trajectories (blue) to ground-truth (green) for the wiry object. PTAM (left) has a considerable offset, whereas our implementation (right) aligns well. Best viewed in color.

vented for the 2:20 minute evaluation sequence (4110 frames) when using model-based tracking for failure recovery. This was not possible using the standard PTAM implementation where tracking is completely lost in 20.9%. Moreover, we could not only keep up tracking during longer periods of time, but even avoid further loss of frames by the use of modelbased tracking. The qualitative results in Figure 8 show very promising results. A video showing additional results is available online.1

PTAM Ours

frames lost 860 (20,9%) 272 (6,6%)

frames recovered 0 (0,0%) 272 (100,0%)

Table 3. Model-based tracking overcomes tracking failures when keypoint-based tracking fails.

1

http://aerial.icg.tugraz.at

We have presented an online model-based multiscale pose estimation approach which builds upon a state-of-the-art VSLAM implementation and integrates prior knowledge of the object of interest during the entire tracking procedure. By exploiting discriminative features in the background we are able to estimate the pose of the camera relative to complex structures such as power pylons, and we are even able to determine a pose when the entire object is occluded. Beyond using the model for metric initialization we employ a refinement step which improves the accuracy of the pose estimate delivered by the SLAM system. Furthermore, when keypoint-based tracking fails as for closeup inspection, purely model-based tracking takes over and successfully reduces tracking failures.

Acknowledgments This work has been supported by the Austrian Research Promotion Agency (FFG) project FIT-IT Pegasus (825841/10397).

References [1] H. Bay, T. Tuytelaars, and L. V. Gool. SURF: Speeded Up Robust Features. In Proc. 9th Eur. Conf. Comp. Vis., pages 404–417, Graz, AT, 2006. 2 [2] H. Bischof and A. Leonardis. Robust Recognition of Scaled Eigenimages Through a Hierarchical Approach. In Proc. IEEE Conf. Comp. Vis. Pattern Recognit., pages 664–670, Santa Barbara, CA, 1998. 1 [3] G. Bleser, H. Wuest, and D. Stricker. Online Camera Pose Estimation in Partially Known and Dynamic Scenes. In Proc. 5th IEEE Int. Symposium Mixed

Figure 8. When not enough features for tracking are available, we automatically switch to a model-based tracking stage in order to avoid tracking failure. Some qualitative results are shown here for ten continuous frames. Best viewed in color.

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

Augmented Reality, pages 56–65, Santa Barbara, CA, 2006. 2 H. Borotschnig, L. Paletta, M. Prantl, and A. Pinz. Appearance-Based Active Object Recognition. Image Vis. Comput., 18(9):715–727, 2000. 1 M. Byne and J. Anderson. A CAD-Based Computer Vision System. ImageVis. Comput., 16(8):533–539, 1998. 1 P. David, D. F. DeMenthon, R. Duraiswami, and H. Samet. SoftPOSIT: Simultaneous Pose and Correspondence Determination. In Proc. 7th Eur. Conf. Comp. Vis., pages 79–95, Copenhagen, DK, 2002. 2 A. J. Davison. Real-Time Simultaneous Localisation and Mapping with a Single Camera. In Proc. 9th Int. Conf. Comp. Vis., pages 1403–1410, Nice, FR, 2003. 2 A. J. Davison, I. D. Reid, N. D. Molton, and O. Stasse. MonoSLAM: Real-Time Single Camera SLAM. IEEE Trans. Pattern Anal. Mach. Intell., 29(6):1052–1067, 2007. 2 D. F. Dementhon and L. S. Davis. Model-Based Object Pose in 25 Lines of Code. Int. J. Comp. Vis., 15(1):123–141, 1995. 2 T. Drummond and R. Cipolla. Visual Tracking and Control Using Lie Algebras. In Proc. IEEE Conf. Comp. Vis. Pattern Recognit., pages 652–657, Fort Collins, CO, 1999. 2, 4 T. Drummond and R. Cipolla. Real-Time Visual Tracking of Complex Structures. IEEE Trans. Pattern Anal. Mach. Intell., 24(7):932–946, 2002. 2 O. D. Faugeras and F. Lustman. Motion and Structure from Motion in a Piecewise Planar Environment. Int. J. Pattern Recognit. Artif. Intell., 2(33):485–508, 1988. 3, 6 M. A. Fischler and R. C. Bolles. Random Sample Consensus: A Paradigm for Model Fitting with Applications to Image Analysis and Automated Cartography. Comm. Assoc. Comput. Mach., 24(6):381– 395, 1981. 2 S. Hinterstoisser, S. Benhimane, and N. Navab. N3M: Natural 3D Markers for Real-Time Object Detection and Pose Estimation. In Proc. 11th Int.

[15]

[16]

[17]

[18]

[19]

[20]

[21]

[22]

[23]

[24] [25] [26]

Conf. Comp. Vis., pages 1–7, Rio de Janeiro, BR, 2007. 2 P. J. Huber. Robust Estimation of a Location Parameter. Annals of Mathematical Statistics, 35(1):73– 101, 1964. 2 P. J. Huber. Robust Statistics. John Wiley & Sons Inc, New York, Chichester, Weinheim, Brisbane, Singapore, Toronto, 1981. 6 G. Klein and T. Drummond. Robust Visual Tracking for Non-Instrumented Augmented Reality. In Proc. 2th IEEE Int. Symposium Mixed Augmented Reality, pages 113–122, Tokyo, JP, 2003. 2, 4 G. Klein and D. Murray. Parallel Tracking and Mapping for Small AR Workspaces. In Proc. 6th IEEE Int. Symposium Mixed Augmented Reality, pages 225–234, Nara, JP, November 2007. 3, 4 V. Kyrki and D. Kragic. Tracking Rigid Objects Using Integration of Model-Based and Model-Free Cues. Mach. Vis. Appl., 22(1):323–335, 2011. 2 A. Leonardis and H. Bischof. Dealing with Occlusions in the Eigenspace Approach. In Proc. IEEE Conf. Comp. Vis. Pattern Recognit., pages 453–458, San Francisco, CA, 1996. 1 D. G. Lowe. Distinctive Image Features from ScaleInvariant Keypoints. Int. J. Comp. Vis., 60(2):91– 110, 2004. 2 F. Moreno-Noguer, V. Lepetit, and P. Fua. Pose Priors for Simultaneously Solving Alignment and Correspondence. In Proc. 10th Eur. Conf. Comp. Vis., pages 405–418, Marseille, FR, 2008. 2 G. Reitmayr and T. Drummond. Going Out: Robust Model-Based Tracking for Outdoor Augmented Reality. In Proc. 5th IEEE Int. Symposium Mixed Augmented Reality, pages 109–118, Santa Barbara, CA, 2006. 2, 4 M. D. Shuster. A Survey of Attitude Representations. J. Astron. Sci., 41(4):439–517, 1993. 6 V. S. Varadarajan. Lie Groups, Lie Algebras and Their Representations. Springer, 1984. 5 Z. Zhang. A Flexible New Technique for Camera Calibration. IEEE Trans. Pattern Anal. Mach. Intell., 22(11):1330–1334, 2000. 3

Online Model-Based Multi-Scale Pose Estimation - CiteSeerX

Online Model-Based Multi-Scale Pose Estimation - CiteSeerX

Suggest Documents

INTERACTIONS OF POSE ESTIMATION AND ONLINE DYNAMIC ...

Pose Estimation for Multi-Camera Systems - CiteSeerX

Generalised Pose Estimation Using Depth - CiteSeerX

CALIBRATING HEAD POSE ESTIMATION IN VIDEOS ... - CiteSeerX

Camera Pose Estimation using Particle Filters - CiteSeerX

Simultaneous Segmentation and Pose Estimation of ... - CiteSeerX

Relative Pose Estimation for Instrumented, Calibrated ... - CiteSeerX

Recurrent Human Pose Estimation

Monocular Head Pose Estimation

Recurrent Human Pose Estimation

Robust Optimal Pose Estimation

Scalable multiscale density estimation

Scalable multiscale density estimation

Bingham Distribution-Based Linear Filter for Online Pose Estimation

Online camera pose estimation in partially known ... - Semantic Scholar

On Partial Least Squares in Head Pose Estimation - CiteSeerX

Human Pose Estimation from a Single View Point, Real ... - CiteSeerX

Vision-based Human Pose Estimation for Pervasive ... - CiteSeerX

Pose estimation for mobile devices and augmented reality - CiteSeerX

Model-Based 3D Hand Pose Estimation from Monocular ... - CiteSeerX

Global hand pose estimation by multiple camera ellipse ... - CiteSeerX

Vision-Based Relative Pose Estimation for Autonomous ... - CiteSeerX

Head Pose Estimation Using Stereo Vision For Human ... - CiteSeerX

Real-time Upper-body Human Pose Estimation using a ... - CiteSeerX