3D Handheld Scanning based on Multiview 3D ... - IEEE Xplore

5 downloads 0 Views 591KB Size Report
Abstract— This paper describes the implementation of a 3D handheld scanning approach based on Kinect. User may get the 3D scans at a very fast rate using ...
2017 IEEE International Conference on Multisensor Fusion and Integration for Intelligent Systems (MFI 2017) November 16 - 18, 2017, Daegu, Korea

3D Handheld Scanning based on Multiview 3D registration using Kinect Sensing Device Shirazi Muhammad Ayaz, Danish Khan and Min Young Kim* 

Abstract This paper describes the implementation of a 3D handheld scanning approach based on Kinect. User may get the 3D scans at a very fast rate using real time scanning devices like Kinect. These devices have been utilized in several applications, but the scanning lacks in the accuracy and reliability of the 3D data, which makes their employment a difficult task. This research proposed the 3D handheld scanning approach based on Kinect device which renders the 3D point cloud data for different views and registers them using visual navigation and ICP. This research also compares several ICP variants with the proposed method. The proposed approach can be used for the 3D modeling applications especially in medical domain. Experiments and results demonstrate the feasibility of the proposed approach to generate reliable 3D reconstructions from the Kinect point clouds.

I. INTRODUCTION Three dimensional reconstruction is a fundamental problem in computer vision and some of its interesting applications are range sensoring, industrial inspection, reverse engineering, object recognition, 3D map building, medical and scientific imaging [1, 2]. One of the difficult problems in computer vision is the 3D reconstruction of the shape of complex objects. Majority of the methods are based on the rendering of the 3D point clouds of each view and their integration into the 3D model. Nowadays, real time and low cost 3D scanners like Kinect allow to get the 3D scans at a high speed with high resolution depth and visual (RGB) sensing. But Kinect has some problems i.e., inaccurate, noisy and missing depth data (holes), inaccurate point clouds due to inadequate calibration performed during manufacturing and erroneous edges created in 3D mapping [3]. To reconstruct the static 3D scenes, KinectFusion is proposed by R.A. Newcombe et al.[11]. KinectFusion utilizes the ICP [12] based alignment of Kinec4@3 '2"-& ".% $/-0,&4&  3$&.& is produced by using a variation in TSDF (truncated signed distance *Research supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSICT) (NRF-2016M2 A2A4A04913462), Basic Science Research Program through the National Research Foundation of Korea(NRF) funded by the Ministry of Education (NRF 2016R1D1A3B03930798), and ICT R&D program of MSIT/IITP (R7124-16-0004, Development of Intelligent Interaction Technology Based on Context Awareness and Human Intention Understanding). Shirazi Muhammad Ayaz is with the School of Electronics Engineering, IT College, Kyungpook National University, 1370 Sankyuk-dong, Buk-gu, Daegu, 41566, Korea. Danish Khan is with the School of Electronics Engineering, IT College, Kyungpook National University, 1370 Sankyuk-dong, Buk-gu, Daegu, 41566, Korea. Min Young Kim is with the School of Electronics Engineering, IT College, Kyungpook National University, 1370 Sankyuk-dong, Buk-gu, Daegu, 41566 Korea (phone: +82-53-950-7233; fax: +82-53-950-5505; e-mail: [email protected]).

978-1-5090-6064-1/17/$31.00 ©2017 IEEE

function) in real time. Despite good accuracy of 3D reconstruction, KinectFusion requires large amount of memory thus making it feasible for small scenes i.e., point cloud library based KinectFusion, KinFu [13]. In the research [14], Kinect uses both the visual and geometric featues to reconstruct the indoor scenes. Commercial applications [15,16] i.e., Reconstructme and Skanect generate the complete 3D model, but the results are not always accurate. The research [17] discusses the two approaches for 3D registration, image feature and geometric feature based methods respectively and concluds that the image feature based approach works when there is sufficient texture in scenes, while geometric feature based approach is feasible for textureless scenes, but sufficient amount of geometric information is needed. In order to solve the problems of inaccurate and unreliable *.&$4@3%"4"we proposes the 3D handheld scanning approach. The multiview registration has two stages i.e., rough registration and alignment of the point clouds using ICP (iterative closest point) algorithm. The way of extracting useful information (features) from depth images seems not to be sophisticated enough [3]. )"4@37)97&uses *.&$4@3RGB camera for visual pose and position estimation rather than using depth images. This paper is organized as follows: Section II describes the methodology of the proposed approach including the overall algorithm; Kinect based handheld scanning approach, visual navigation and multiview registration. Section III describes the conducted experiments and discusses the results of the handheld scanning approach. Section IV concludes the proposed approach and reveals the directions for the future work. II. METHODOLOGY A. Kinect based Handheld Scanning Approach We used Kinect sensor by Microsoft to obtain 3D and RGB data. The Kinect sensor is composed of a 3 channel camera 640x480 pixels, an IR projector and an IR camera and all of them are calibrated in advance. Depth resolution is 1 cm and spatial resolution is 3 mm. The specifications of the Kinect device are shown in Table I. The proposed hardware setup to accomplish multiview registration is based on Kinect device with the 3D object and chessboard target placed at a certain distance. The proposed approach first accomplishes the acquisition of the 3D point clouds for different views and captures the $)&33#/"2% *-"(& '/2 &"$) 6*&7 53*.( *.&$4@3   $"-&2" with the handheld motion. These RGB images are used to estimate the relative rotation and translation parameters between different views using visual navigation and multiple view registration is performed to merge these point clouds using estimated parameters and ICP. The block diagram of the proposed approach is shown in Fig. 1.

330

y@= homogenous normalized image coordinates in image 2 After applying the singular value decomposition (SVD), relative orientation and translation parameters can be estimated as follows:

E  U VT

(6)

U=complex unitary matrix:=rectangular diagonal matrix VT=conjugate transpose of unitary matrix V We initialize a special matrix W to estimate the rotation R and translation T matrices [5]. Some reasons make this matrix special i.e., this matrix is an orthogonal matrix and helps in estimating the rotation and translation information from essential matrix. The rotation matrix is given as follows:

Figure 1. Block diagram of the proposed approach TABLE I.

SPECIFICATIONS OF MICROSOFT KINECT

S.NO.

Properties

1.

Image size

640 x 480 pixel

2.

Frame rate

30 fps

Depth z resolution(at 2 m distance) Spatial x/y resolution (at 2 m distance)

3. 4.

Specifications

R  UW1VT

The translation parameters can be estimated upto certain scale factor and a four-fold ambiguity i.e., four possible solutions. The best solution can be decided based on experiments [5]. The equations for translation matrix T are as follows:

1 cm 3 mm

T=UW:UT

B. Visual Navigation The visual navigation algorithm uses the chessboard images to estimate the essential matrix and relative pose between the two views of a camera. For chessboard detection in different views, corner detector [4] is used. After chessboard detection, we estimated the essential matrix and extracted the relative orientation and translation parameters of different views from the essential matrix using the method [5]. !)".(@3method [6] was 53&% '/2 4)& $",*#2"4*/. /' 4)& *.&$4@3   $"-&2" The translation and rotation information between the two cameras in physical space is contained in the matrix known as essential matrix [10]. Hence there is a motivation to estimate the translation and rotation information from the essential matrix. The procedure of estimating the translation and rotation parameters between two views was used from [5]. The normalized image coordinates are mathematically defined as follows:

P  K R t 

(1),

x  PX

R=rotation matrix, t=translation matrix or vector

. $"3& : %/&3 ./4 3"4*3'9 4)& $/.342"*.43 7)&. 2&", 7/2,% data is used, the equations are as follows: T=UZUT

0 W  1 0

1 0 0

0

0 , 0  Z  1 0 1 

x1= normalized image coordinates The essential matrix ?@ can be determined using the following equation:

y=homogenous normalized image coordinates in image 1

1 0 0

s 0 , 0  0 0 0 

(11) 0 s 0

0  0 0 

Alternatively, if we put s=-1 *.:4)&02/%5$4?:@7*,,#& &15",4/?!@-"42*87)*$)0/*.43/544)&6",*%*49/'4)&-&4)/% 7*4)?!@-"42*8The translation parameters along x, y and z axes can be estimated from T matrix as follows:

0  T   tz t y 

t z 0 tx

ty   t x  0  

X iref  Ri X i  Ti

(5)

T=VZVT

(10),

Z=skew matrix= Z=diag (1, 1, 0) W

X= 3D coordinates, x=image coordinates

Ey  0

(9)

(12)

C. Multiview Registration of Point clouds The rough registration is based on the transformation of point clouds of different views into coordinate of reference view point cloud via visual navigation described in section II (B). The equation governing the rough registration is as follows:

P= projection matrix, K= intrinsic matrix

' T

T=:T

(8),

(tx,ty,tz) =translation along x, y and z axes respectively.

(2)

x  K R t  X (3), x1  K1x  R t X (4)

y 

(7)

(13)

Xi= 3D point cloud of ith view Ri = relative rotation between ith view and reference view point cloud Ti= relative translation between ith view and reference view point cloud Xiref= ith point cloud transformed into reference view

331

In order to refine the roughly registered point clouds, ICP algorithm based on k-d tree, point to plane error metric and worst rejection 10% pairs [7] was used. This algorithm takes the point clouds which are in approximate registration with the reference view. The ICP algorithm consists of the following main steps. 1. K-d tree based matching of the two point clouds. 2. Finding the relative orientation and translation parameters from the matched 3D points. 3. Transformation of ith point cloud into the coordinate of the reference view point cloud using estimated parameters. 4. Estimation of root mean square error (RMS) for point to plane error metric. 5. Repeating steps 1-4 till ICP converges to acceptable solution. III. EXPERIMENTS AND RESULTS We performed experiments with artificial skull to demonstrate the medical applications of the 3D modeling. We obtained 3D data of skull object and RGB image performing handheld motion using Microsoft Kinect device. After obtaining 3D data, we performed rough registration and applied the ICP algorithm to further refine the roughly registered 3D data. Fig. 2 shows the experimental setup of the skull object and chess board target placed at 70 cm from the Kinect device. The registration results of the four point clouds with respect to the reference view before and after applying the coarse registration and ICP algorithm are shown in different colors in Fig. 3 and the RMS error for each point cloud aligned with respect to reference view is recorded in Table II. The RMS error was found to be less than 1 mm which demonstrates good accuracy of the ICP algorithm. In order to compare the proposed ICP algorithm based on k-d tree, point to plane error metric and worst rejection 10% pairs with other variants, we followed some related procedure as TABLE II.

mentioned in [7,8] and utilized the implementation of the ICP variants [9] for comparison with our ICP algorithm. We compared the proposed ICP algorithm with other variants under the propoerties i.e., convergence, speed, robustness and accuracy. For robustness test, we placed the skull object on turn table and rotated the turntable for 2, 4 and 6 deg in clockwise and anticlockwise directions along the y-axis of the Kinect sensor. We took the position before rotation as reference view (zero deg position). So, we captured the 3D data at reference, clock wise and anticlockwise positions. We applied ICP to estimate the angle at the rotating positions and the results are shown in Fig. 4(a). Fig. 4(a) shows the robustness result of the ICP variants and concludes that the worst rejection 10% pairs based point to plane and point to point error metrics outperfomed and displayed symmetry on either sides of the zero deg position. The comparison for the speed of the ICP variants is shown in Fig. 4(b). Brute force and the K-d tree based matching strategies were found to be the most and least computationally expensive respectively. We also performed the convergence test for different error metrics, matching and rejection strategies. Fig. 4(c) shows that point to plane outperformed all the other error metrics as far as convergence is concerned. The convegence behavior of the point to point metric is similar to that of the point to point based Levenberg Marquardt (LM) minimization strategy. The convergence results for the matching strategies as shown in Fig. 4(d) lead to the conclusion that Brute force, K-d tree and Delaunay matching strategies performed better than the extrapolated K-d tree and LM based K-d tree matching. Finally, Fig. 4(e) shows the convergence results that worst rejection 10% pairs performed better than the edge rejection based strategy.

RMS ERROR FOR FOUR DIFFERENT VIEW POINT CLOUDS ALIGNED USING ICP WITH RESPECT TO REFERENCE VIEW S.No.

Point cloud 1

Point cloud 2

RMS Error (mm)

1.

Reference view

point cloud 1

0.9588

2.

Reference view

point cloud 2

0.9949

3.

Reference view

point cloud 3

0.9715

4.

Reference view

point cloud 4

0.9174

Figure 2. Experimental setup for Kinect based handheld scanner with skull object and chess board target

332

For the comparison of the ICP variants with our algorithm, we captured the Kinect 3D data placing the skull object on a turn table with the rotation of 2, 4 and 6 deg along y-axis. The measurements of the respective angles using the ICP variants are recorded in Table III, which concludes that the point to plane with 10% worst rejection strategy outperformed the other variants. Fig. 5 shows the comparison between the single 3D scan and integration of the five point clouds registered using multiview registration. We combined five different point clouds using Box Grid Filter. Fig. 5(a) and Fig. 5(b) show the skull object and the single 3D scan respectively, while Fig. 5(c) depicts the integration of the five point clouds, which is better than the single 3D scan. Hence the integation of the multiple point clouds may enhance the 3D modelling accuracy.

Kinect device is proposed. The proposed hardware consists of the Kinect device with handheld motion to capture different view point clouds. For visual navigation, the chessboard target based approach was utilized in this research. For multiview 3D approach, rough registration and final alignment of point clouds using ICP further improves the accuracy of the 3D scanning. The proposed system is feasible for 3D handheld scanning and finds application in medical imaging. We also reported a comprehensive comparision of the several ICP variants on the 3D data of the skull object captured using Kinect. The RMS error to align different point clouds is found to be less than 1mm. Future direction includes the real time implementation of the handheld scanning approach with the features based visual navigation avoiding the special target object.

IV. CONCLUSION In this research, 3D handheld scanning approach based on

(a)

(b)

Figure 3. Multiview 3D registration results, (a) Four point clouds and reference view point cloud before rough registration and ICP (b) Four point clouds and reference view point cloud after rough registration and ICP

TABLE III.

COMPARISON OF THE PROPOSED ICP ALGORITHM AND OTHER VARIANTS FOR THE ACCURACY

S.No.

ICP Variants

Measured Angle (deg)

Ground truth Angle (deg)

1.

Point to point

2.13

2.84

5.18

2

4

6

2.

Point to plane

1.73

3.13

5.46

2

4

6

3.

Point to plane with 10% worst rejection

1.49

3.46

5.81

2

4

6

4.

Point to point with edge rejection

2.13

2.90

5.21

2

4

6

5.

Point to plane with edge rejection

1.73

3.20

5.47

2

4

6

6.

LM with Edge rejection

2.13

2.90

5.21

2

4

6

7.

Point to point with 10% worst rejection

1.43

3.03

5.26

2

4

6

333

(a)

(b)

(c)

(d)

(e) Figure 4. Comparision of ICP algorithm and other variants, (a) Robustness, (b) Speed, (c-e) Convergence for error metrics, matching and rejection strategies respectively.

334

(a)

(b)

(c)

Figure 5. Comparison of the single scan and the integration of the multiple point clouds, (a) Artificial skull object, (b) Single 3D point cloud, (c) Integration of the four point clouds and the reference view point cloud

REFERENCES [1] Salvi, Joaquim, Sergio Fernandez, Tomislav Pribanic, and Xavier Llado. "A state of the art in structured light patterns for surface profilometry." Pattern recognition 43, no. 8 (2010): 2666-2680. [2] Geng, Jason. "Structured-light 3D surface imaging: a tutorial. < %6".$&3 in Optics and Photonics 3, no. 2 (2011): 128-160. [3] Han, Jungong, Ling Shao, Dong Xu, and Jamie Shotton. "Enhanced computer vision with microsoft kinect sensor: A review." IEEE transactions on cybernetics 43, no. 5 (2013): 1318-1334. [4] Geiger, Andreas, Frank Moosmann, Ömer Car, and Bernhard Schuster. "Automatic camera and range sensor calibration using a single shot." In Robotics and Automation (ICRA), 2012 IEEE International Conference on, pp. 3936-3943. IEEE, 2012. [5] Hartley, Richard, and Andrew Zisserman. "Multiple View Geometry in Computer Vision Second Edition." Cambridge University Press (2000). [6] Zhang, Zhengyou. "A flexible new technique for camera calibration." IEEE Transactions on pattern analysis and machine intelligence 22, no. 11 (2000): 1330-1334. [7] Rusinkiewicz, Szymon, and Marc Levoy. "Efficient variants of the ICP algorithm." In 3-D Digital Imaging and Modeling, 2001. Proceedings. Third International Conference on, pp. 145-152. IEEE, 2001. [8] Kjer, Hans Martin, and Jakob Wilm. "Evaluation of surface registration algorithms for PET motion correction." Bachelor's thesis, Technical University of Denmark, DTU, DK-2800 Kgs. Lyngby, Denmark, 2010. [9] Wilm, Jakob, and H. M. Kjer. "Iterative closest point." MATLAB Central (2013). [10] Bradski, Gary, and Adrian Kaehler. IEEE Trans. on PAMI, vol. 14, no. 2, pp. 239;256, Feb. 1992. [13] *.2/$&&%*.(3/' /2+3)/0  

335

Suggest Documents