Registration of multiple ToF camera point clouds - DiVA

Registration of multiple ToF camera point clouds Tobias Hedlund

June 23, 2010 Master’s Thesis in Engineering Physics, 30 ECTS-credits Supervisor at CS-UmU: Niclas Börlin Supervisors at Adopticum: Jonas Sjöberg & Emil Hällstig Examiner: Christina Igasto

U ME A˚ U NIVERSITY D EPARTMENT OF P HYSICS ˚ SE-901 87 UME A SWEDEN

Abstract Buildings, maps and objects et cetera, can be modeled using a computer or reconstructed in 3D by data from different kinds of cameras or laser scanners. This thesis concerns the latter. The recent improvements of Time-of-Flight cameras have brought a number of new interesting research areas to the surface. Registration of several ToF camera point clouds is such an area. A literature study has been made to summarize the research done in the area over the last two decades. The most popular method for registering point clouds, namely the Iterative Closest Point (ICP), has been studied. In addition to this, an error relaxation algorithm was implemented to minimize the accumulated error of the sequential pairwise ICP. A few different real-world test scenarios and one scenario with synthetic data were constructed. These data sets were registered with varying outcome. The obtained camera poses from the sequential ICP were improved by loop closing and error relaxation. The results illustrate the importance of having good initial guesses on the relative transformations to obtain a correct model. Furthermore the strengths and weaknesses of the sequential ICP and the utilized error relaxation method are shown.

ii

ii

Acknowledgements I would like to thank a number of people for supporting me in this thesis work. First of all, I would like to thank Niclas Börlin who introduced me to the area of photogrammetry and image analysis a few years back. Without his enthusiasm and support as main supervisor, this wouldn’t have been possible. I would also like to thank my contacts at Adopticum, Jonas Sjöberg and Emil Hällstig, for their support and for making me feel at home in the company office. Thanks to my examiner, Christina Igasto, for helping me with the report and for answering my LATEX questions. Finally, thanks to Maria for listening to all my complaints and worries these last four and a half months.

iii

iv

iv

Contents 1

2

3

Introduction

1

1.1

Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1

1.2

Aims . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2

1.3

Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2

Theory

5

2.1

The Pinhole Camera Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

5

2.2

ToF cameras . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

6

2.3

Rigid body transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

7

2.4

Quaternions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

7

2.5

Methods for estimating transformations . . . . . . . . . . . . . . . . . . . . . . .

8

2.5.1

SIFT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

9

2.5.2

RANSAC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

9

2.6

Calibration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

9

2.7

ToF camera errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

10

2.8

Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

11

2.9

ICP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

13

2.9.1

Estimating rigid body transformations (The Procrustes Problem) . . . . . .

13

2.9.2

Selection of points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

13

2.9.3

How to match points . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

14

2.9.4

Rejection of point pairs . . . . . . . . . . . . . . . . . . . . . . . . . . . .

14

2.9.5

kd-trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

15

2.10 Loop closing and error relaxation . . . . . . . . . . . . . . . . . . . . . . . . . . .

15

Implementation

19

3.1

Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

19

3.2

Pairwise ICP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

21

3.3

Multi-view registration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

22

3.4

Loop closing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

22

3.5

Global optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

23

v

vi

4

5

CONTENTS

Experiments 4.1 Test cases . . . . . . . . . . . . . . 4.1.1 Translation of the camera . . 4.1.2 Room with reference objects 4.1.3 Empty room . . . . . . . . 4.1.4 The torso . . . . . . . . . . 4.1.5 The box . . . . . . . . . . . 4.1.6 The synthetic data . . . . . Results 5.1 Translation of the camera . . 5.2 Room with reference objects 5.3 Empty room . . . . . . . . . 5.4 Rotating torso . . . . . . . . 5.5 Rotating box . . . . . . . . . 5.6 The synthetic data . . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . . .

. . . . . .

. . . . . . .

. . . . . .

. . . . . . .

. . . . . .

. . . . . . .

. . . . . .

. . . . . . .

. . . . . .

. . . . . . .

. . . . . .

. . . . . . .

. . . . . .

. . . . . . .

. . . . . .

. . . . . . .

. . . . . .

. . . . . . .

. . . . . .

. . . . . . .

. . . . . .

. . . . . . .

. . . . . .

. . . . . . .

. . . . . .

. . . . . . .

. . . . . .

. . . . . . .

. . . . . .

. . . . . . .

. . . . . .

. . . . . . .

. . . . . .

. . . . . . .

. . . . . .

. . . . . . .

. . . . . .

. . . . . . .

. . . . . .

. . . . . . .

. . . . . .

. . . . . . .

. . . . . .

. . . . . . .

. . . . . .

. . . . . . .

. . . . . .

. . . . . . .

25 25 26 26 27 27 28 28

. . . . . .

31 31 34 34 36 36 38

6

Discussion

43

7

Conclusions

45

8

Future work

47

References

49

vi

List of Figures The pinhole camera model. The camera center, C, the principal point, p, and the focal length, f . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 The SR4000 ToF camera. Arrows are showing the right-handed coordinate system and the direction of the X, Y and Z-axis, as well as the angles of rotation; roll, pitch and yaw. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3 Schematic illustration of the quaternion angle representation. . . . . . . . . . . . . 2.4 Matched SIFT keypoints between two amplitude images of a ToF camera. . . . . . 2.5 Point fitting to a plane with RANSAC. Inliers, i.e. accepted points, marked by green dots. Points further away than the distance treshold, t, marked by red dots. . . . . . 2.6 Two common artifacts of ToF cameras. . . . . . . . . . . . . . . . . . . . . . . . . 2.7 Neighborhood angle filter proposed by May et al. (2009). Two neighboring pixels on a sensor marked in blue, pi and pi,n are points in space. . . . . . . . . . . . . . 2.8 ICP alignment of a dataset onto a model. Dataset position before registration, blue dots. Dataset position after registration, red dots. . . . . . . . . . . . . . . . . . . 2.9 Methods for matching points. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.10 Graph edges between views, i.e. cameras, which are close enough. . . . . . . . . . 2.1

3.1 3.2 3.3 3.4

4.1 4.2 4.3 4.4 4.5

Model used to illustrate filtering techniques. . . . . . . . . . . . . . . . . . . . . . Different filtering techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The point-to-point method for determining corresponding points between datasets. Illustration of the loop detecting algorithm. The large camera has just been added. The cameras within the wireframe sphere fulfill the distance criterion. The cameras surrounded by a green circle also have a similar viewing direction, and thus they are matchable. Crossed over cameras are too close behind to form a loop. . . . . . . . Translation of the camera . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The room in which the rotation scenario was done. Chairs are placed evenly spaced as reference objects. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Three amplitude images from the case of the rotating torso. . . . . . . . . . . . . . The box on a stand. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The synthetic data constructed with Matlab’s peaks-function. . . . . . . . . . . . vii

5

6 8 9 10 11 12 14 15 16 19 20 22

23 26 27 27 28 29

viii

LIST OF FIGURES

5.1 5.2 5.3 5.4 5.5 5.6 5.7 5.8 5.9 5.10

Translation of the camera aiming at an office desk. . . . . . . . . . . . . . . . . . Approximative cumulative error in y and z-position of the camera. . . . . . . . . . Values of the function being minimized, equation 2.10, during the optimization. . . The reconstructed room with chairs . . . . . . . . . . . . . . . . . . . . . . . . . A close-up on the camera positions from the reconstructed room with chairs . . . . Approximative cumulative error in x, y and z-position of the camera. . . . . . . . . The reconstructed empty room . . . . . . . . . . . . . . . . . . . . . . . . . . . . Reconstruction of a human torso. . . . . . . . . . . . . . . . . . . . . . . . . . . . Reconstruction of a non-cubic box. . . . . . . . . . . . . . . . . . . . . . . . . . . The optimized box from the top view. Comparison between actual measures (red), and measures from the reconstructed model (blue). . . . . . . . . . . . . . . . . . 5.11 Registration of the synthetic data for σ = 0.09 meters. . . . . . . . . . . . . . . . 5.12 The mean camera pose improvement after the optimization for different noise levels.

viii

32 33 33 34 35 35 36 37 37 38 39 41

Chapter 1

Introduction 1.1

Background

Laser scanners and Time of Flight (ToF) cameras can be used to measure huge amounts of 3D coordinates in a short period of time. This data can be used to reconstruct objects or areas if scans are taken from multiple angles. Since the 3D coordinates are relative to the scanning device, one has to eventually move all points into the same coordinate system to make the model complete. This fusing of point clouds are often referred to as registration. Registration can be made from aerial laser scans or aerial photography (Ronnholm et al., 2008). Reconstruction of buildings can be made to create digital maps in for instance Google Earth1 , Bing Maps2 or for machine simulators and realistic computer games (Boström et al., 2008; Gruen and Akca, 2005; Akca, 2007). Maps can be made using mobile robots in unknown and possibly hazardous environments for example in underground mining (Magnusson, 2006). A technique called Simultaneous Localization And Mapping (SLAM), is often used by robots and autonomous vehicles to build up such a map (Prusak et al., 2008; Nüchter et al., 2007; Borrmann et al., 2008). Reconstruction of organs within the medical field can be made using data from computed tomography (CT) (Almhdie et al., 2007). Scans are used to digitize famous sculptures and monuments (Bae and Lichti, 2008). In addition to the above mentioned, there are also a variety of other applications linked to registration through the field of object recognition in 3D (Mian et al., 2006). The usual source of data for reconstruction has in the past come from laser scanners because they give accurate measurements and provide a very dense set of points in each scan. Measurement accuracies can be as good as