A Markerless Registration Method for Augmented Reality based on ...

A Markerless Registration Method for Augmented Reality based on Affine Properties Y. Pang*, M.L. Yuan∞, A.Y.C. Nee*∞, S.K. Ong∞, Kamal Youcef-Toumi*§ *Singapore-MIT Alliance §Massachusetts Institute of Technology ∞ National University of Singapore {pangyan|mpeyml|mpeneeyc|mpeongsk}@nus.edu.sg, [email protected]

Abstract This paper presents a markerless registration approach for Augmented Reality (AR) systems based on the Kanade-Lucas-Tomasi (KLT) natural feature tracker and the affine reconstruction and reprojection techniques. First, the KLT tracker is used to track the corresponding feature points in two control images. Next, four planar points are specified in each control image to set up the world coordinate system. The affine reconstruction and reprojection techniques are used to estimate the image projections of these four specified planar points in the live video sequence, and these image projections are used to estimate the camera pose in real time. A primitive approach that illustrates the basic idea is first discussed and a robust method is given to improve the resistance to the noise. Pre-defined markers are not required in the proposed approach. The virtual models can still be registered on the proper positions using the proposed method even if the specified region is occluded during the registration process . . Keywords: Augmented Reality, Registration, Affine Reprojection, Affine Reconstruction, KLT Tracker

1

Introduction

Augmented Reality (AR), which by definition is a real environment “augmented” with virtual objects, has become a promising technology to support threedimensional (3D) tasks. Registration is one of the most important issues in AR systems and currently poses some limitations to different AR applications (Azuma 1997, 2001). The registration process is the superimposing of virtual objects onto a real scene using information extracted from the scene. Specifically, this information consists of the feature points of the real scene detected using some tracking techniques. In terms

Copyright © 2006, Australian Computer Society, Inc. This paper appeared at the Seventh Australasian User Interface Conference (AUIC2006), Hobart, Australia. Conferences in Research and Practice in Information Technology (CRPIT), Vol. 50. Wayne Piekarski, Ed. Reproduction for academic, not-for profit purposes permitted provided this text is included.

of tracking techniques, there are two categories of registration approaches: sensor-based (mechanical, magnetic, ultrasonic, optic) approaches and computer vision-based approaches. In sensor-based approaches, there is a need to calibrate the external sensors, and the current sensor-based equipments are either bulky or expensive, or lack satisfactory levels of accuracy. On the other hand, computer vision-based methods can avoid calibration of external sensors and offer the potential for accurate tracking without bulky and costly equipment. The computer-vision based approaches can be categorized as two different types in terms of camera calibration requirements. The first type of approaches does not require the calibration of camera parameters (either intrinsic or extrinsic parameters) in advance, which involves the use of a known 3D calibration object. One method to solve this type of problems is to compute the camera parameters and 3D structures in the Euclidean space directly from the information contained in the image sequences. For example, Ong et al. (1998) assumed an orthographic camera model and developed an AR approach by reconstructing some unorganized 3D points. An alternative method is to perform the AR task in the projective or affine space rather than in the Euclidean space. Seo et al. (2000) used the projective motion of the real camera recovered from the image corresponding points to obtain the intrinsic and extrinsic parameters of a virtual camera aligned with the real camera during motion. However, this method cannot guarantee the existence of a virtual camera because there is a possibility of no solutions from the projective matrix. Kutulakos et al. (1998) presented a calibration-free AR method without the need for a Euclidean calibration of the real camera by applying an affine object representation. The affine representation of the virtual objects allows one to compute the projection of an object without requiring the camera calibration information or the Euclidean 3D structure of the environment. However, the affine representation of virtual objects has some disadvantages (Seo et al. 2000, Kutulakos et al. 1998). Although a number of researchers have recently tried to develop selfcalibration techniques, no self-calibration algorithms are robust in a general case. The second type of approaches assumes that the intrinsic camera parameters are pre-calibrated. This is a

popular assumption in most of the current AR systems. ARToolkit developed by Kato et al. (1999), is the most well-known computer vision-based library for the rapid development of AR applications. In ARToolkit, a square marker of known dimensions is used to define the World Coordinate System (WCS), and the camera extrinsic parameters, i.e., camera pose, are estimated using the four vertices of the marker. Since the hardware setup for ARToolkit is very simple, many existing AR systems are developed based on the ARToolkit library (Zauner et al. 2003, Frund et al. 2003, Tomas et al. 2000). However, in ARToolkit, if the marker is not visible or partially occluded, the registration process will fail. In addition, ARToolkit requires a pre-defined marker in the real scene, which is inflexible for outdoor applications. Based on the work of Seo et al. (2000) and Kato et al. (1999), Yuan et al. (2005) proposed a method by which a user can specify four points, which form an approximate square, to define the WCS. The projections of these four specified points are computed in the live video sequence and used to estimate the camera pose. In this method, the projective reconstruction technique is used to set up an updated projective transformation between the image points and the 3D projective space. However, experiments showed that this method is unstable when the camera moves. In this paper, motivated by the research of Kutulakos et al. (1998), Kato et al. (1999), and Yuan et al. (2005), a markerless registration method is proposed based on the Kanade-Lucas-Tomasi (KLT) tracker (Tomasi et al. 1991) and affine properties. There are three main stages in this proposed method. The first stage is the affine reconstruction stage. In this stage, two control images from the video sequence are selected and the KLT tracker is used to extract the natural feature points. Next, the Affine Coordinate System (ACS) is defined using these natural feature points. Four planar points for setting the Euclidean WCS are specified by the user in two control images respectively, and the affine coordinates of the specified points are reconstructed using the affine reconstruction technique. The second stage is the reprojection stage. The corresponding affine reprojection matrix in the live video frame is computed using the natural feature points tracked by the KLT tracker. The image projections of the four specified points, which have been defined in the first stage, are estimated in the live video sequence using the affine reprojection matrix. In the third stage, the camera extrinsic parameters, i.e., camera pose, are estimated in terms of the four specified points obtained in the second stage. Lastly, the virtual objects can be rendered on the real scene using the graphics pipeline techniques, e.g., OpenGL. The main differences between our work and that of Kutulakos et al. (1998) are: 1) Our method represents the virtual objects in Euclidean space instead of in affine space. The metric information of virtual objects,

such as the distance between an object’s vertices can be obtained unlike in Kutulakos’s method. 2) The work reported by Kutulakos et al. used the affine camera model which is unable to generate a perspective view. Our approach uses the perspective pinhole camera model, which is more common in practice. 3) The method proposed by Ktulakos et al. does not decompose the affine camera into intrinsic components and extrinsic components, and this makes it impossible for the AR system to generate shading and shadow effects in graphics view. In the method proposed in this paper, it is assumed that the intrinsic parameters are known in advance, as it is reasonable in most cases. The extrinsic parameters of the camera are estimated in real time in the Euclidean WCS. This enables the users to specify lighting and material properties to create shaded objects and their shadows. Unlike the method reported by Yuan el at (2005), the method proposed in this paper utilizes the affine techniques to estimate the image projections of specified points instead of using the projective techniques, and it is relatively simple and easy. As compared to the work by Yuan et al. (2005), this proposed approach is more stable and the registration process can be achieved as long as at least four feature points are tracked rather than at least six feature points are needed. In addition, by using the KLT natural feature tracker, the proposed approach can perform the registration process in a markerless environment. Recently, many markerless registration approaches have been reported. Chia et al. (2002) used two reference views to compute the current camera pose through minimizing cost functions of epipolar geometry constraints on natural feature correspondences. A local moving edge tracker (Comport et al. 2003) was used to provide real-time tracking of points normal to the object contours and a non-linear pose computation was used for registration. Vacchetti et al. (2004) presented a new approach to combine the information from edges and feature points for robust real-time 3-D tracking. In their system, pose estimation is obtained using the POSIT algorithm. Najafi et al. (2004) introduced a novel approach for automated initialization of markerless tracking systems. They combined the plenoptic viewing, intensity-based registration and iterative closest point techniques to improve the registration accuracy. Gordon et al. (2004) applied the SIFT algorithm for natural feature extraction to achieve markerless tracking and registration in AR systems. In these methods, the camera pose is computed using a non-linear algorithm. The non-linear optimization process is handled using numerical iterative algorithms, such as Newton-Raphson or Levenberg-Marquardt. Non-linear methods yield accurate results but are also computationally expensive. Different from these methods, the method proposed in this paper computes the camera pose using a purely geometric approach presented by Kato et al. (1999). The image projections of four specified points are estimated in the current

video frame using the affine reprojection technique, and the camera pose is computed in terms of these four specified points. The computation of our proposed method is simpler and faster as compared to the nonlinear methods. It is suitable for the real time requirements of AR systems. The remainder of this paper is organized as follows. Section 2 presents the necessary geometric fundamentals. A primitive registration algorithm is described in Section 3. Section 4 discusses a method to improve the robustness of the primitive approach. Some experimental results are shown in Section 5, followed by the final remarks and conclusion in Section 6.

Preliminaries

2.1

Affine Coordinate System (ACS)

four of them are not coplanar, four non-coplanar points of { P1 , L , Pn } can be selected to define an ACS. One is the origin and the other three points are the affine basis points. Any other points can be represented using the origin and the affine basis points in the defined ACS.

ACS Properties

There are two important properties of ACS (Kutulakos et al.1998, Koenderink et al. 1991, Weinshall et al. 1995) that can be used for AR registration purposes. Property 1 (Affine Reconstruction Property): The affine coordinates of any 3D point can be determined using Equation (1) if its projection, i.e., image coordinate, in at least two different views, is known and the projections of the affine origin and basis points are also known in those views. Based on this property, the affine coordinates of any 3D point can be computed in terms of its projection and the projections of the affine origin and the basis points in the two control images. The affine coordinates of this 3D point can be reconstructed from the solution of the following over-determined equation.

ub12 vb12 ub22 vb22

− ub10 − vb10 − ub20 − vb20

ub13 vb13 ub23 vb23

− ub10 − vb10 − ub20 − vb20

ub10 ⎤ ⎡ X ⎤ ⎥ vb10 ⎥ ⎢ Y ⎥ ⎢ ⎥ ub20 ⎥ ⎢ Z ⎥ ⎥⎢ ⎥ vb20 ⎥⎦ ⎣ 1 ⎦

(1) where [X Y Z ]T is the affine coordinate of a 3D point;

[u

j

v

]

j T

coordinate [ X k

Yk

Z k ] using Equation (2): T

⎡X k ⎤ ⎡u k ⎤ ⎡ub1 − ub0 ub2 − ub0 ub3 − ub0 u b0 ⎤ ⎢ ⎥ Y ⎢v ⎥ = ⎢ v − v vb2 − vb0 vb3 − vb0 vb0 ⎥ ⎢ k ⎥ ⎢ k ⎥ ⎢ b1 b0 ⎥⎢ Z ⎥ ⎢⎣1 ⎥⎦ ⎣⎢ 0 0 0 1 ⎥ k 144444424444443⎦ ⎢⎣ 1 ⎥⎦

[u

Given a set of points P1 , L , Pn ∈ R ( n ≥ 4 ) and at least

− ub10 − vb10 − ub20 − vb20

Property 2 (Affine Reprojection Property): When the image projections of the affine origin and basis points are known in the image plane, the projection of point Pk in this image can be computed from its affine

where [u k

3

⎡ u 1 ⎤ ⎡ub11 ⎢ 1⎥ ⎢ 1 ⎢ v ⎥ = ⎢ vb1 ⎢u 2 ⎥ ⎢ub2 ⎢ 2 ⎥ ⎢ 21 ⎣⎢ v ⎦⎥ ⎢⎣ vb1

control images.

(2)

M 3×4

2

2.2

P0 and the affine basis points ( P1 , P2 , P3 ) in the two

( j = 1,2) are the image projections of the 3D

[

point in the two control images; and ubj i

vbji

] ( j = 1,2 ; T

i = 0,L,3 ) are the image projections of the affine origin

vk

]

1]

T

is the image projection of Pk ;

v bi 1 ( i = 0,L,3 ) are the image projections of the origin P0 and the three basis points P1 , P2 , P3 ; the T

bi

matrix M 3×4 is known as the affine reprojection matrix. The reprojection property implies that the image projection of a 3D point for any camera position is completely determined using the reprojection matrix, which is defined by the image projections of the affine origin and basis points. Hence, if the affine coordinates of any 3D point are known in advance, its projection can be estimated using the affine reprojection matrix M 3×4 .

3

Primitive Registration Algorithm

Based on the two properties of ACS discussed above, we can compute the projections of a set of points at the new camera positions and achieve the virtual objects registration. The algorithm is given as follows. Steps 1 to 5 form the affine reconstruction stage, in which we define the ACS and specify the four planar points to set up a Euclidean WCS. Steps 6 and 7 are the affine reprojection stage, in which the projections of specified points are estimated by affine reprojection technique in video sequence. Steps 8 and 9 are the registration stage, in which the camera pose is estimated and the graphics pipeline techniques are used to render the virtual objects in the real scene properly. Step 1: Track the natural feature points pi ([u p v p ]T ) i

i

(i = 1,..., n) in video sequence using the KLT tracker. Step 2: Choose two control images (1 and 2) and extract the corresponding natural feature points

{ p i1 , pi2 } (i = 1,..., n)

image projections of the affine origin and basis points tracked in this image (Equation 3).

M 3k×4 (a)

(b)

Figure 1 Two control images: The circle points are the natural feature points tracked by the KLT tracker; (a) and (b) are the control images 1 and 2 respectively Step 3: Define the ACS using four non-coplanar natural feature points which are tracked by the KLT tracker. As shown in Figure 2, we choose feature point 0 ( P0 ) as origin and 1, 3, 4 ( P1 , P2 , P3 ) as the affine basis points.

⎡u kp1 − u kp0 ⎢ = ⎢ v kp1 − v kp0 ⎢0 ⎣

[

k pi

v

projections

of

where u

]

k T pi

u kp2 − u kp0 v kp2 − v kp0

u kp3 − u kp0 v kp3 − v kp0

0

0

u kp0 ⎤ ⎥ v kp0 ⎥ 1 ⎥⎦ (3)

( i = 0,1,2,3 ) is the updated image

the

affine

origin P0

and

basis

points P1 , P2 , P3 in k image. th

Step 7: Use the 3D affine coordinates of specified points computed in Step 5 and the updated k

affine reprojection matrix M 3×4 to compute the

projections

of

specified

k

points xi

(i = 1,..., 4) in the kth image using the affine reprojection property (Equation 4).

Figure 2 ACS defined by 4 feature points Step 4: Specify

xi ([ui

four

points

vi ] ) (i = 1,..., 4) (the green cross T

in Figure 3) in the two control images respectively. These points form an approximate square to set up the Euclidean WCS. The origin of the WCS is the center of specified region. X- and Y-axes are the direction of two different parallel sides respectively. Z-axis is the cross product of X and Y. Some planar structures can be used in this step to help to specify the planar points accurately in the control images.

(a)

(b)

Figure 3 Specify four points in two control images 1, 2 respectively; the green cross is the points specified by user manually. Step 5: Compute the 3D affine coordinates ( [X i Yi Z i 1]T ) ( i = 1,L,4 ) of the four specified points using the affine reconstruction property described in Section 2. Step 6: In the kth image, compute the corresponding k

affine reprojection matrix M 3×4 using the

⎡X i ⎤ ⎡u ik ⎤ ⎢Y ⎥ ⎢ ⎥ xik = ⎢ vik ⎥ = M 3k×4 ⎢ i ⎥ ⎢ Zi ⎥ ⎢1⎥ ⎢ ⎥ ⎣ ⎦ ⎣1⎦

(4)

Step 8: Estimate the camera pose using the four specified planar points. This camera pose estimation approach with planar structure was proposed by Kato et al. (1999). The camera pose including rotation (R) and translation (t) represents the geometry relationships between the Euclidean WCS and the Camera Coordinate System (CCS) (Figure 4).

Figure 4 The relationship between the WCS and CCS (Kato et al. 1999) Step 9: Since the intrinsic parameters are known in advance, using the pose parameters (R and t), graphics pipeline techniques, e.g., OpenGL, can be implemented to register the virtual objects in the proper positions in the real scene. Figure 5 shows the registration results (the red, green and blue lines represent the X, Y and Z axes of WCS respectively). As shown in Figure 5 (b), although the specified points are occluded by the user’s body, the registration process can still be achieved properly.

(a)

(b)

Figure 5 Virtual objects registering in the real scene: (a) and (b) are 69th and 166th images in the video sequence respectively

4

Robust Registration Approach

In principle, the ACS can be defined by four noncoplanar points directly and the affine reprojection matrix M 3×4 can be updated in the live video sequence by tracking these four points. As we know, all the tracking techniques are affected by the noise. The ACS defined by four feature points directly is sensitive to the noise. We did a simple experiment to analyze the noise disturbance in the registration process. Since the registration error is relied on the accuracy of reprojections of the four specified points, we introduce an error representation function to analyze the reprojection accuracy. n ) m ki − mki ∑ (5) error = i =1 n where mki represents the projections of evaluation feature points in kth video frame tracked by a robust tracker (these points are not used to compute the affine reprojection matrix and the tracking results of them are ) relatively robust), mki represents the reprojections of the evaluation feature points obtained using affine reprojection property, and

• is the Euclidean distance

between two image points. The result in Figure 6 shows that the reprojection error of specified points greatly increases when the noise is high. In addition, if one of the affine component points (origin and basis points) cannot be tracked in the real scene in one video frame, the registration will fail and the initialization steps need to be repeated again. In order to increase the resistance to noise, a robust method which defines the ACS and updates the affine reprojection matrix using a non-coplanar feature point set (the number of feature points ≥ 4 ) instead of using four non-coplanar feature points directly is given herein. The algorithm steps of robust registration algorithm are similar to the procedure discussed in Section 3 except that the ACS definition, affine reconstruction and reprojection steps are executed in different ways.

Figure 6 Reprojection error versus Noise

4.1

Define ACS using a Feature Point Set

In Step 3 of the algorithm discussed in Section 3, the ACS is defined directly using four non-coplanar feature points. In terms of the observation in Weinshall et al. (1995) a robust ACS can be defined using a 3D point set: the center of mass of the 3D point set is the affine origin and the principal components of this 3D point set are the affine basis points (Figure 7). In this robust ACS, the projections of four specified points can be estimated more stably in the live video sequence.

Figure 7 ACS defined by a feature point set

4.2

Affine Reconstruction in ACS Defined by a Feature Point Set

Based on the affine reprojection property discussed in Section 2, a new affine property can be derived. In affine projection, the center of mass of 3D point set in ACS is projected at the center of the projections of these points. Specifically, if Pc ( [X c Yc Z c 1]T ) and

pc ( [uc

]

vc ) are the center of the mass of a set of 3D T

points { P1 , L, Pn } and their corresponding projections { p1 , L , p n } respectively, then the projection of Pc is pc . Given the projections of natural feature points P1 , L , Pn ( n ≥ 4 ) in the two control images, a centered-measurement matrix ( A4×n ) (Tomasi et al. 1992) is defined as follows:

A4×m

⎡ u 1p1 ⎢ 1 vp = ⎢ 21 ⎢u p ⎢ 21 ⎢⎣ v p1

− uc1 L u 1pn − uc1 ⎤ ⎥ − v1c L v1pn − v1c ⎥ − uc2 L u 2pn − uc2 ⎥ ⎥ − vc2 L v 2pn − vc2 ⎥⎦

(6)

[

i

v ip j

where u p j projection

[u

of

]

T

( i = 1,2 ; j = 1, L , n ) is the

Pj in two control images and

]

T

vci (i = 1,2) is the center of mass of these

i c

projections . It is known that under noise-free conditions the rank of centered-measurement matrix is three (Tomasi et al. 1992). As we know, a matrix with rank three has a basis of three linearly independent components (principal components). These principal components can be used to define the ACS. In practice, the rank of centeredmeasurement matrix is not three because of the noise. The singular-value decomposition (SVD) technique can be used to decompose this center measurement matrix in order to obtain the approximate principal components and compute the affine coordinate for each natural feature point:

A4×n

⎡ u 1p1 ⎢ 1 vp = ⎢ 21 ⎢u p ⎢ 21 ⎣⎢ v p1

− u1c L u 1pn − u1c ⎤ ⎥ − vc1 L v1pn − v1c ⎥ T = U 4×4 D4×nVn×n − uc2 L u 2pn − uc2 ⎥ ⎥ − vc2 L v 2pn − vc2 ⎦⎥

where U 4×4

r = [u1

r u2

r r u 4 ] , u1

r u3

[r

r u2

are the eigenvector of AA ; Vn×n = v1 T

r r v1 L vn are the eigenvector of AT A ; ⎡D D4×n = ⎢ 0 ⎣0

(7)

r r u3 u 4 r L vn ] ,

0⎤ 0⎥⎦

i th (i=1,2) control image can be given by Equation (8) and (9)

M 3k×4

⎡ubk1 − u ck ⎢ = ⎢ vbk1 − vck ⎢0 ⎣

[

where ubki

v

ub13 − u c1 ⎤ ⎥ 1 vb13 − vc1 ⎥ 2 = U ( D ) 4×3 3×3 ub23 − u c2 ⎥ ⎥ vb23 − vc2 ⎥⎦

− u c1 − vc1 − u c2 − vc2

]

k T bi

ubk2 − u ck vbk2 − vck

ubk3 − uck vbk3 − vck

0

0

u ck ⎤ ⎥ vck ⎥ 1 ⎥⎦

(8)

[

k

(9)

, (k=1,2; i = 1, 2, 3) are the

v ck

]

T

is the projection of the affine

origin, i.e., the centroid of the 3D point set.

4.3

Update Affine Reprojection Matrix in ACS Defined by a Feature Point Set

From Step 6 of the algorithm given in Section 3, the k

updated affine reprojection matrix M 3×4 in the kth image can be computed directly from the image coordinates of the affine origin and the basis points tracked in this image. If a feature point set is used to define the ACS as in Section 4.1, the image coordinates of the affine origin and basis points cannot be tracked directly using the KLT tracker. This means the reprojection matrix cannot be updated directly using Equation (3). Alternatively, the reprojection matrix can be estimated through tracking the updated image projections of the 3D point set, which is used to define the ACS.

]

[

]

where [ p1 L pn ] is the image projections of the natural feature points in the 3D point set and [P1 L Pn ] is the affine coordinates of the natural

feature points which can be computed using Equation (10). Since M 3×4 has eight unknowns and each natural feature point generates two equations, this linear system can be solved if at least four natural feature points can be tracked in the live video sequence, which means the registration process remains effective as long as four natural feature points are tracked.

5

projections of the affine basis points defined by the 3D point set and u c

(9), the affine coordinates of the four specified points can be reconstructed in terms of Equation (1).

[

If U 4×3 , D3×3 , and Vn×3 are the upper 4 × 3 , 3 × 3 and n × 3 blocks of U, D and V respectively, the affine reconstruction matrix and reprojection matrix of the

ub12 vb12 u b22 vb22

2

equation can be deduced from the affine reprojection property (Equation 2): p1 L p n = M 3×4 P1 L Pn (11)

are the non-vanishing eigenvalues of A.

⎡ ub11 − u c1 ⎢ 1 1 ⎢ vb1 − vc ⎢ub2 − u c2 ⎢ 21 2 ⎢⎣ vb1 − vc

1

( M 3× 4 , M 3× 4 ) can be obtained from Equations (8) and

Equation (11) shows the affine reprojectoin matrix M 3×4 can be estimated by solving a linear system. This

D0 = diag (σ 1 ,L , σ r ) , σ 1 ≥ L ≥ σ r > 0

σi

The corresponding affine coordinates of the natural feature points can be computed using Equation (10). 1 ⎡ ⎤ 2 (10) D ( ) (Vn×3 ) T ⎥ ⎢ 3×3 P = [P1 L Pn ] = ⎢ ⎥ 1 ⎣ ⎦ Since the reprojection matrix in both control images

Experiments

The system prototype has been implemented in C++ using OpenGL and GLUT libraries on an IBM PC with a Pentium 4 processor (1.8 GHz). The video sequence is captured with an IEEE 1394 FireFly camera. The virtual models are imported from 3DS files. The image size is 640x480, and the system runs in real time (around 20 fps). Using the error definition in Section 4, some experiments have been performed to analyze the

registration accuracy of the proposed algorithm. Firstly, we tested the accuracy in both methods we discussed in Sections 3 and 4. As described in Section 4, we introduced noise to disturb the feature point tracking during the experiment. The result in Figure 8 shows that the robust method in which the ACS is defined by a feature point set (with 12 feature points) is more stable compared to the primitive method in Section 4. 90o

45o

(c)

90o

Figure 8 Error data comparison between the primitive and robust approach In addition, we compared the registration accuracy of the robust affine approach with the projective method given by Yuan et al. (2005). In the experiment, we found out that the effects of camera translations along the optical axis were minor, while rotations mattered. We changed the camera view point angle from about 90 degrees (the camera’s optical axis is orthogonal to the real scene) to 45 degrees in the live video sequence. The experimental results are shown in Figure 9. Some conclusions can be drawn from the experiment: 1) The proposed approach is more stable compared to the method in Yuan et al. (2005) during camera motion, especially when only a small number of natural feature points can be tracked. 2) The experimental results are affected by the conditions of tracking the feature points. When the feature points are stable, better results can be obtained. 3) Generally, both methods perform better when more natural feature points are used.

(a)

(b)

45o

(d) Figure 9 Error data comparison between the proposed approach and the method in Yuan et al. (2005): (a), (b), (c) and (d) are results of using the reprojection matrix estimated by 8, 9, 10, 11 feature points respectively

6

Discussions and Potential Applications in User Interface

Augmented Reality techniques provide a promising interface for human computer interaction (HCI) (Azuma 1997, 2001). The virtual control panel and 3D virtual objects can be registered on a real scene to generate an innovative user interface for different applications. Many marker-based AR interface systems have been reported, e.g., AR assembly guidance (Zauner et al. 2003), AR product design (Frund 2003), AR entertainment (Thomas 2000), etc. These marker-based systems require pre-defined markers in the real environment for camera pose estimation, which is not convenient for the outdoor environment. Since predefined markers are not required, our approach is suitable for the outdoor AR application. In addition, marker-based systems are limited by the occlusion problem. With the proposed approach, a stable WCS is defined in which virtual objects can still be superimposed at the proper positions even though the specified points may be occluded during the registration process. This is important for some AR applications. For example, in an AR assembly simulation system, virtual objects registration and assembly motion analysis are based on a universal WCS. In conventional marker-based systems, since the WCS is attached to a marker, the motion information will be lost and the registration of the virtual assembly parts will fail if the marker is partially occluded by the human body or real assembly parts.

7

Conclusions

In this paper, we presented a markerless registration method for AR systems based on the KLT tracker and affine reconstruction and reprojection techniques. Some experimental results have shown the improvement of the proposed approach compared with the previous work. One limitation of the proposed approach is that the user has to specify four points manually in the initialization stage. However, if there is a square planar structure in the real scene and it can be recognized by the tracking techniques, the entire registration process can be performed automatically without manual inputs. This paper relies on the KLT tracker and does not consider tracking the feature points robustly. These issues will be addressed in the future.

8

References

Azuma R., (1997): A Survey of Augmented Reality, Presence, 6(4):355-385 Azuma R., Baillot Y., Behringer R., Feiner S., Julier S. and MacIntyre B. (2001): Recent Advances in Augmented Reality, IEEE Computer Graphics and Applications, 21(6):34–47 Ong K.C., Teh H.C. and Tan T.S., (1998): Resolving Occlusion in Image Sequence Made Easy, The Visual Computer, 14:153-165 Seo Y. and Hong K.S., (2000): Calibration-Free Augmented Reality in Perspective, IEEE Transactions on Visualization and Computer Graphics, 6(4):346-359 Kutulakos K.N. and Vallino J.R., (1998): CalibrationFree Augmented Reality, IEEE Transactions on Visualization and Computer Graphics, 4(1):1–20 Kato H. and Billinghurst M., (1999): Marker Tracking and HMD Calibration for Video-based Augmented Reality Conferencing System, Proc. of the 2nd IEEE and ACM Int’l Workshop on Augmented Reality, pp.85-93 Zauner J., Haller M. and Brandl A., (2003): Authoring of a Mixed Reality Assembly Instructor for Hierarchical Structures, Proc. of the Second IEEE and ACM Int’l Symposium on Mixed and AugmentedReality, pp.237-246 Frund J., Gausemeier J., Matysczok C. and Radkowski R., (2003) Application Areas of AR-Technology within Automobile Advance Development, Int’l Workshop on Potential Industrial Applications of Mixed and Augmented Reality, pp. 1-7. Thomas, B., Close, B.; Donoghue, J., Squires, J., De Bondi, P., Morris, M., Piekarski, W., (2000): ARQuake: an outdoor/indoor augmented reality first person application, The Fourth Int’l Symposium on Wearable Computers, pp.139146 Yuan M.L., Ong S.K. and Nee A.Y.C., (2005): Registration Based on Projective Reconstruction Technique for Augmented Reality Systems, IEEE Transactions on Visualization and Computer Graphics, 11(3):254-264

Tomasi C. and Kanade T., (1991): Detection and Tracking of Point Features, Carnegie Mellon University Technical Report CMU-CS-91-132 Chia K.W., Cheok A.D., and Prince S.J.D., (2002): Online 6 DOF Augmented Reality Registration from Natural Features, Proc. of the Int’l Symposium on Mixed and Augmented Reality, pp.305-313 Comport A.I., Marchand E. and Chaumette F., (2003): A real-time tracker for markerless augmented reality, Proc. of the Second IEEE and ACM Int’l Symposium on Mixed and Augmented Reality, pp.36-45 Vacchetti L., Lepetit V. and Fua P., (2004): Combining edge and texture information for real-time accurate 3D camera tracking, Proc. of the Third IEEE and ACM Int’l Symposium on Mixed and Augmented Reality, pp. 48-56 Najafi H., Navab N., and Klinker G., (2004): Automated Initialization for Marker-less Tracking: A Sensor Fusion Approach, Proc. of the Third IEEE and ACM Int’l Symposium on Mixed and Augmented Reality, pp. 79-88 Gordon I. and Lowe D.G., (2004): Scene Modelling, Recognition and Tracking with Invariant Image Feature, Proc. of the Third IEEE and ACM Int’l Symposium on Mixed and Augmented Reality, pp. 110-119 Koenderink J.J. and van Door A.J., (1991): Affine Structure From Motion, Journal of the Optical Society of America A, 8(2): 377–385 Weinshall D. and Tomasi C., (1995): Linear and Incremental Acquisition of Invariant Shape Models from Images Sequence, IEEE Transactions on Pattern Analysis and Machine Intelligence, 17(5):512-517 Tomasi C. and Kanade T., (1992): Shape and Motion from Image Steams: a Factorization Method Full Report on the Orthographic Case, Cornell TR 921270 and Carnegie Mellon CMU-CS-92-104