Inertial Navigation System Assisted Visual ... - Semantic Scholar

0 downloads 0 Views 2MB Size Report
sequences of images, i.e. fast motion with respect to the field of view or .... Commonly used in the field of robotics, the extended Kalman filter (EKF) was ...... [52] Edwin M. Leidholdt Jr. Jerrold T. Bushberg, J. Anthony Seibert and John M. Boone. The Essential Physics of Medical Imaging. Lippincott Williams & Wilkins, 2001.
Inertial Navigation System Assisted Visual Localization by

Dennis Krys,

BEng, University Of Victoria, 2006

A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF

MASTER OF APPLIED SCIENCE

in

The College of Graduate Studies

(Electrical Engineering)

THE UNIVERSITY OF BRITISH COLUMBIA (Okanagan) March 2011 c

Dennis Krys, 2011

Abstract With recent advancements in Global Positioning Systems (GPS), localization systems are now typically equipped with a GPS. However, in a large variety of environments and real-world applications, GPS-based localization systems are not practical. This research tackles such a problem and illustrates the idea of fusing a camera and an inertial navigation system (INS) to create a dead reckoning localization system. The original purpose of the localization system is for a pipe inspection robot, but the proposed concepts can be readily applied to any environment where there is a wall in close proximity.

The proposed sensor system can determine motions with up to six degrees of freedom using a camera and an INS. The system must assume a geometry for the wall, such as a at surface, a wall in a hallway, or the round surface of the inside of a pipe.

If the

geometry of the wall is unknown then another sensor, such as a laser rangender, can be added to measure the range and estimate the overall shape of the wall.

The localization system uses a combination of optical ow and image registration to obtain information about six degrees of freedom from a wall with little to no features. The INS provides an estimated motion for the camera system.

The estimation of the

motion is used by the optical ow algorithm to reduce the computational load signicantly and by the image registration to decrease the likelihood of the algorithm diverging.

The system is validated using numerical simulation and experiments. The experiments were conducted on a test platform constructed specically in this research project to simulate the motion of a free-swimming robot inside a pipe. The simulator uses accurate

ii

position sensors to measure the exact location of the proposed localization system and compare it with the results obtained from the latter.

Both the numerical simulation

results and the results from the simulator are in agreement with reading of the localization system developed in this research.

iii

Preface Parts of the research contained in this thesis was contained in three conference papers [1 3]. In all three papers, I gathered all the data and did all the processing and analysis. The rst paper [1] mainly concentrated on characterizing the simulator, which is described in Section 4.1. It also described the performance of the camera, an INS and a laser ranger nder for localization purposes. It is noted that much later in my work the laser range nder would be removed, but at this point it was being used..

The second paper [2]concentrated on data fusion with the camera, INS and laser range nder. This paper focused on the use of the INS for block matching, see Section 3.2.1. All results in this paper were for 2D localization only.

The third paper [3] concentrated on expanding what was learned in 2D experiments and enhancing it into 3D. By this time only the INS and camera were being used for navigation. Concepts like Lucas-Kanade algorithm, see Section 1.4.5 and 3.2.2, and image registration, see Section 3.3, were incorporated into the localization algorithm.

iv

Table of Contents Abstract

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

ii

Preface

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

iv

Table of Contents

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

v

List of Tables

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

viii

List of Figures

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

ix

Acknowledgements 1 Introduction

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

xiv

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1

1.1

Motivations

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1

1.2

Objectives

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2

1.3

Methodology

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

3

1.4

Literature Review

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1.4.1

Localization Problem

. . . . . . . . . . . . . . . . . . . . . . . .

1.4.2

Sensor Fusion through Gaussian Filters

. . . . . . . . . . . . . .

4

4

6

v

1.5

1.4.3

Dead Reckoning

1.4.4

Image Mosaicing and Optical ow

1.4.5

Lucas-Kanade

Contributions

. . . . . . . . . . . . . . . . . . . . . . . . . . .

12

. . . . . . . . . . . . . . . . .

13

. . . . . . . . . . . . . . . . . . . . . . . . . . . .

15

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

18

2 Robot Model and Camera Model 2.1

Robot Modeling

2.2

Camera Modeling

. . . . . . . . . . . . . . . . . . . . . .

20

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

20

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

25

3 Localization Based on Fusion of the INS and Vision System

. . . . .

29

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

29

3.1

Camera Model

3.2

Feature Matching

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

35

3.2.1

Block Find

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

35

3.2.2

Lucas-Kanade

3.2.3

Selecting Feature Points

. . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . .

41

. . . . . . . . . . . . . . . . . . . . . . . . . . . . .

43

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

46

3.3

Image Registration

3.4

EKF Technique

4 Experimental Setup and Results 4.1

Simulator

4.2

Sensors

38

. . . . . . . . . . . . . . . . . . . . . .

53

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

53

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

56

vi

4.3

Result Categories . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

58

4.4

Simulated Motions

59

4.5

Experiment Parameters

4.6

Computer Generated Results

4.7

Results from Real Experiments

. . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . .

67

. . . . . . . . . . . . . . . . . . . . . . .

81

. . . . . . . . . . . . . . . . . . . . . . . .

98

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

102

5 Conclusions and Future Work Bibliography

65

vii

List of Tables 1.1

Kalman Filter

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1.2

Extended Kalman Filter

1.3

Unscented Kalman Filter Prediction Step

1.4

Unscented Kalman Filter Update Step

1.5

Lucas-Kanade

1.6

Computation Cost

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

17

1.7

Limitations on Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . .

18

4.1

Image Registration Results with Motion Prediction and No Noise

61

4.2

Image Registration Results with No Motion Prediction and No Noise

4.3

Image Registration Results with Motion Prediction

4.4

Image Registration Results with No Prediction

4.5

. . . . . . . . . . . . . . . . . . . . . . . . . . .

8

9

. . . . . . . . . . . . . . . . .

10

. . . . . . . . . . . . . . . . . . .

11

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

16

. . . .

. .

62

. . . . . . . . . . . .

63

. . . . . . . . . . . . . .

64

Frame Skip Number to Time Conversion

. . . . . . . . . . . . . . . . . .

66

4.6

Computer Generated Results Summary

. . . . . . . . . . . . . . . . . .

67

4.7

Simulator Results Summary

. . . . . . . . . . . . . . . . . . . . . . . . .

84

viii

List of Figures 1.1

Gavia

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1.2

Optical Flow Image

2.1

Underwater Robot Coordinate Frame

. . . . . . . . . . . . . . . . . . .

20

2.2

Unicycle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

22

2.3

Convert Coordinate Frames

. . . . . . . . . . . . . . . . . . . . . . . . .

25

2.4

Pin hole Camera Model

. . . . . . . . . . . . . . . . . . . . . . . . . . .

26

2.5

Non-inverted Camera Model

. . . . . . . . . . . . . . . . . . . . . . . .

27

3.1

Coordinate Frame Conversion

. . . . . . . . . . . . . . . . . . . . . . . .

29

3.2

Block Matching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

37

3.3

Image Registration

44

4.1

Schematic of the Simulator

4.2

Picture of the Simulator

. . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1

13

. . . . . . . . . . . . . . . . . . . . . . . . .

53

. . . . . . . . . . . . . . . . . . . . . . . . . . .

54

4.3

INS Stationary Characteristics . . . . . . . . . . . . . . . . . . . . . . . .

57

4.4

Image Examples

58

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

ix

4.5

Image Registration Results with Motion Prediction and No Noise (Top Left Case 1, Top Right Case 2, Bottom Center Case 3)

4.6

. . . . . . .

63

. . . . . . .

64

. . . . . . .

65

Simulated X Distance, Case 1, Frame Skip 1 . . . . . . . . . . . . . . . .

68

Image Registration Results with Motion Prediction (Top Left Case 1, Top Right Case 2, Bottom Center Case 3)

4.8

Image Registration Results with No Prediction (Top Left Case 1, Top Right Case 2, Bottom Center Case 3)

4.9

61

Image Registration Results with No Motion Prediction and No Noise (Top Left Case 1, Top Right Case 2, Bottom Center Case 3)

4.7

. . . . . . .

4.10 Simulated X Velocity, Case 1, Frame Skip 1

. . . . . . . . . . . . . . . .

68

4.11 Simulated Y Distance, Case 1, Frame Skip 1 . . . . . . . . . . . . . . . .

69

4.12 Simulated Y Velocity, Case 1, Frame Skip 1

. . . . . . . . . . . . . . . .

69

4.13 Simulated Z Distance, Case 1, Frame Skip 1

. . . . . . . . . . . . . . . .

69

4.14 Simulated Z Velocity, Case 1, Frame Skip 1

. . . . . . . . . . . . . . . .

70

4.15 Simulated Pitch, Case 1, Frame Skip 1

. . . . . . . . . . . . . . . . . . .

70

4.16 Simulated Roll, Case 2, Frame Skip 10

. . . . . . . . . . . . . . . . . . .

70

4.17 Simulated Yaw, Case 1, Frame Skip 1

. . . . . . . . . . . . . . . . . . .

71

4.18 Simulated X Distance, Case 1, Frame Skip 20

. . . . . . . . . . . . . . .

71

4.19 Simulated X Velocity, Case 1, Frame Skip 20

. . . . . . . . . . . . . . .

72

4.20 Simulated Y Distance, Case 1, Frame Skip 20

. . . . . . . . . . . . . . .

72

x

4.21 Simulated Y Velocity, Case 1, Frame Skip 20

. . . . . . . . . . . . . . .

72

4.22 Simulated Z Distance, Case 1, Frame Skip 20

. . . . . . . . . . . . . . .

73

4.23 Simulated Z Velocity, Case 1, Frame Skip 20 . . . . . . . . . . . . . . . .

73

4.24 Pitch, Case 1, Frame Skip 20

73

. . . . . . . . . . . . . . . . . . . . . . . .

4.25 Simulated Roll, Case 1, Frame Skip 20

. . . . . . . . . . . . . . . . . . .

74

4.26 Simulated Yaw, Case 1, Frame Skip 20

. . . . . . . . . . . . . . . . . . .

74

4.27 Simulated X Distance, Case 3, Frame Skip 1 . . . . . . . . . . . . . . . .

75

4.28 Simulated X Velocity, Case 3, Frame Skip 1

. . . . . . . . . . . . . . . .

75

4.29 Simulated Y Distance, Case 3, Frame Skip 1 . . . . . . . . . . . . . . . .

75

4.30 Simulated Y Velocity, Case 3, Frame Skip 1

. . . . . . . . . . . . . . . .

76

4.31 Simulated Z Distance, Case 3, Frame Skip 1

. . . . . . . . . . . . . . . .

76

4.32 Simulated Z Velocity, Case 3, Frame Skip 1

. . . . . . . . . . . . . . . .

76

4.33 Simulated Pitch, Case 3, Frame Skip 1

. . . . . . . . . . . . . . . . . . .

77

4.34 Simulated Roll, Case 3, Frame Skip 10

. . . . . . . . . . . . . . . . . . .

77

4.35 Simulated Yaw, Case 3, Frame Skip 1

. . . . . . . . . . . . . . . . . . .

77

4.36 Simulated X Distance, Case 3, Frame Skip 20

. . . . . . . . . . . . . . .

78

4.37 Simulated X Velocity, Case 3, Frame Skip 20

. . . . . . . . . . . . . . .

78

4.38 Simulated Y Distance, Case 3, Frame Skip 20

. . . . . . . . . . . . . . .

79

4.39 Simulated Y Velocity, Case 3, Frame Skip 20

. . . . . . . . . . . . . . .

79

xi

4.40 Simulated Z Distance, Case 3, Frame Skip 20

. . . . . . . . . . . . . . .

79

4.41 Simulated Z Velocity, Case 3, Frame Skip 20 . . . . . . . . . . . . . . . .

80

4.42 Pitch, Case 3, Frame Skip 20

80

. . . . . . . . . . . . . . . . . . . . . . . .

4.43 Simulated Roll, Case 3, Frame Skip 20

. . . . . . . . . . . . . . . . . . .

80

4.44 Simulated Yaw, Case 3, Frame Skip 20

. . . . . . . . . . . . . . . . . . .

81

. . . . . . . . . . . . . . . . . . . . . . . . . . .

82

4.45 Block Matching Results

4.46 Block Matching Results (Zoom)

. . . . . . . . . . . . . . . . . . . . . .

4.47 Block Matching Results Pixel Movement

83

. . . . . . . . . . . . . . . . . .

83

4.48 X Distance, Case 1, Frame Skip 1 . . . . . . . . . . . . . . . . . . . . . .

85

4.49 X Velocity, Case 1, Frame Skip 1

. . . . . . . . . . . . . . . . . . . . . .

86

4.50 Y Distance, Case 1, Frame Skip 1 . . . . . . . . . . . . . . . . . . . . . .

86

4.51 Y Velocity, Case 1, Frame Skip 1

. . . . . . . . . . . . . . . . . . . . . .

86

4.52 Z Distance, Case 1, Frame Skip 1

. . . . . . . . . . . . . . . . . . . . . .

87

4.53 Z Velocity, Case 1, Frame Skip 1

. . . . . . . . . . . . . . . . . . . . . .

87

4.54 Pitch, Case 1, Frame Skip 1

. . . . . . . . . . . . . . . . . . . . . . . . .

87

4.55 Roll, Case 1, Frame Skip 1

. . . . . . . . . . . . . . . . . . . . . . . . .

88

4.56 Yaw, Case 1, Frame Skip 1

. . . . . . . . . . . . . . . . . . . . . . . . .

88

4.57 X Distance, Case 1, Frame Skip 20

. . . . . . . . . . . . . . . . . . . . .

89

4.58 X Velocity, Case 1, Frame Skip 20

. . . . . . . . . . . . . . . . . . . . .

89

xii

4.59 Y Distance, Case 1, Frame Skip 20

. . . . . . . . . . . . . . . . . . . . .

89

4.60 Y Velocity, Case 1, Frame Skip 20

. . . . . . . . . . . . . . . . . . . . .

90

4.61 Z Distance, Case 1, Frame Skip 20

. . . . . . . . . . . . . . . . . . . . .

90

4.62 Z Velocity, Case 1, Frame Skip 20

. . . . . . . . . . . . . . . . . . . . .

90

. . . . . . . . . . . . . . . . . . . . . . . .

91

4.64 Roll, Case 1, Frame Skip 20

. . . . . . . . . . . . . . . . . . . . . . . . .

91

4.65 Yaw, Case 1, Frame Skip 20

. . . . . . . . . . . . . . . . . . . . . . . . .

91

4.66 X Distance, Case 3, Frame Skip 1 . . . . . . . . . . . . . . . . . . . . . .

92

4.67 X Velocity, Case 3, Frame Skip 1

. . . . . . . . . . . . . . . . . . . . . .

92

4.68 Y Distance, Case 3, Frame Skip 1 . . . . . . . . . . . . . . . . . . . . . .

93

4.69 Y Velocity, Case 3, Frame Skip 1

. . . . . . . . . . . . . . . . . . . . . .

93

4.70 Z Distance, Case 3, Frame Skip 1

. . . . . . . . . . . . . . . . . . . . . .

93

4.71 Z Velocity, Case 3, Frame Skip 1

. . . . . . . . . . . . . . . . . . . . . .

94

4.72 X Distance, Case 3, Frame Skip 20

. . . . . . . . . . . . . . . . . . . . .

95

4.73 X Velocity, Case 3, Frame Skip 20

. . . . . . . . . . . . . . . . . . . . .

95

4.74 Y Distance, Case 3, Frame Skip 20

. . . . . . . . . . . . . . . . . . . . .

95

4.75 Y Velocity, Case 3, Frame Skip 20

. . . . . . . . . . . . . . . . . . . . .

96

4.76 Z Distance, Case 3, Frame Skip 20

. . . . . . . . . . . . . . . . . . . . .

96

4.77 Z Velocity, Case 3, Frame Skip 20

. . . . . . . . . . . . . . . . . . . . .

96

4.63 Pitch, Case 1, Frame Skip 20

xiii

Acknowledgments I would like to express my gratitude to the faculty and sta at UBC as well as my fellow coworkers at NRC-IRC. Without their support this endeavor would have not been possible. In particular, I would like to thank Homayoun Najjarian for his support and dedication.

He guided me through this stage of my life and was always there when I

needed him. He was more than just an academic adviser and mentor, he was also my friend.

I would also like to thank my friends for their understanding and support, and for being there when I need them. For provide well needed breaks and moral support.

Finally I would like to thank my brother, for helping me keep things in prospective, and my parents who raised me and guided me. Without their unconditional love I would not be the person I am today and all this would not have been possible.

xiv

1 Introduction 1.1

Motivations

Localization has been dubbed the most fundamental problem to provide a mobile robot with autonomous capabilities [4]. This holds true for a variety of environments including autonomous underwater vehicles that have long been desired for inspection of large water mains and high pressure conduits without decommissioning the water service. Besides numerous diculties in the mechanical design and building of such a robot, navigation has proved to be a major challenge [5]. Currently, an autonomous underwater vehicle (AUV) is being adapted at the National Research Council (NRC) Canada to carry nondestructive testing (NDT) sensors within in-service water mains [6]. The vehicle is an open-water torpedo-shape modular AUV known as Gavia AUV [7, 8], see Figure 1.1.

Figure 1.1: Gavia

In its original form, the localization of the robot is not adequate for either autonomous navigation inside aqueducts and water pipes, or for the position of the defects observed by the NDT sensors.

The inspection robot is subject to a dynamic environment, so

1

localization must be carried out in real time and must be able to deal with uncertainties associated with sensor measurements. The NDT system will include a variety of dierent sensors including a vision system for the detection of defects in the pipes inner walls. This makes the Gavia an ideal candidate for a vision based localization system.

1.2

Ob jectives

The main objective of this research is to prepare an appropriate hardware and software sensor system for the Gavia that will allow the Gavia to localize itself in real time, i.e. as it travels within an in-service water main. The proposed system must be able to ad hoc sensors that would be required for pipe inspection as well as an existing inertial navigation sensor (INS) and rangender. This was to include a camera system, INS and a rangender.

The sensor system also needs to provide a suciently accurate estimation of the velocity and position of the robot within the pipe in real time so that the Gavia can adjust its speed and heading required by the inspection sensors i.e., typically inspection sensors need positioning and velocity regulation for the optimal performance.

More precisely,

uctuations of the velocity of the robot deteriorate the reading of the NDT sensors if an increase in water ow occurs the sensor system must be able to sense the change in velocity and adjust the speed of the robot accordingly.

Since the objectives require robot state estimation in real time, an emphasis will be placed on the methods that are not only accurate but computationally ecient also.

2

1.3

Methodology

The research was conducted in order to provide a sensor system for a pipe inspection robot. There is a wide variety of dierent techniques for localization. These techniques can use a variety of dierent sensors which include cameras, inertial navigation systems (INS), and global positioning systems (GPS). Typically, these techniques will not work underground or in a pipe because many rely on GPS. GPS sensors provide an absolute position, which is ideal for sensor fusion systems that are used for localization. This is dierent from an INS, which measures accelerations rather than an absolute position. A camera system can provide an absolute position, when a list of known objects and their absolute positions are known. a pipe.

This is often not the case, and will not work in

Most vision sensor systems are also computationally costly and have troubles

identifying objects in environments that appear the same, i.e. environments that look the same and don't have distinctive features such as a pipe wall.

INS sensors require

small computational load and are excellent for short periods of time, but they have severe drifting problems when run for longer than a couple of seconds where there is no means of resetting the error. In this paper a new sensor system is proposed that fuses the output of an INS and a camera. This allows the low computational characteristics of the INS to benet the vision system and the vision system to alleviate the drifting problem of the INS. The INS allows an estimation of the movement between frames to be sent into the Lucas-Kanade algorithm, which is a block matching technique for feature detection. Once the common features are found in the frames, the results are passed to the image registration. The image registration extracts the motion from the change in location of the features between the two images calculates a motion the camera performed between the images. This motion is then passed to an extended kalman lter (EKF) to update the state space variables. This thesis will show the evolution of this sensor system from a 2D system to a 3D system. The proposed localization system is suitable for real-time

3

operations.

The implementation of the system into a real-time platform has not been

within the scope of this research and is considered as future work.

1.4

Literature Review

1.4.1 Localization Problem The problem of localization is a fundamental problem in robotics. There are a variety of dierent localization techniques for dierent environments and situations. No one method is the answer to all situations. Most solutions for localization problems today use a GPS [911]. The GPS provides an excellent means for obtaining the absolute position data, but unfortunately the GPS will not work in all environments, including underground and inside pipes. Another localization technique that is useful in absence of a GPS is radio frequency (RF) localization [12]. This requires the user to either set up a series of towers in the environment or use pre-existing systems, such as WIFI or cellular. Once again it's not feasible to set up RF towers inside a pipe for navigation purposes. This leads to the use of other sensors to obtain the position information.

Often it is not feasible to have a sensor that returns the absolute position data. Then the system must rely on dead reckoning, see section 1.4.3. When using a dead reckoning system, sensors that provide a change in state such as an INS and/or wheel encoders can work well. It depends on the robot's conguration and environment. However with all dead reckoning systems, errors will build up and the localization will deteriorate. Some sensors systems are far more sensitive to errors than others.

For example, INS

is not suitable for localization over an extended period of time because they determine displacement and velocity by integration of accelerations measured over time.

More

precisely, a small error in the measurement of acceleration at any instant will compromise

4

the calculated values of velocity and displacement after. This is why the INS only provide accurate measurements over short periods of time.

This can be countered by fusing

another sensor into the system that has dierent weaknesses and strengths. There are a variety of dierent methods for fusing dierent sensor data together, see Section 1.4.2.

For years now, sensor systems have been using cameras to help them navigate [10, 1315]. There is a wide variety of ways to utilizing a camera in a sensor system. One method is to use optical ow [1518]. This method looks at the ow patterns of subsections of the image from one frame to the next. This pattern can then give insight for the motion the camera performed from one image to the next. The method does not give an absolute position, just a relative motion with respect to the environment. Also this method can have issues generating the ow patterns if there are large changes in motion between sequences of images, i.e. fast motion with respect to the eld of view or changing angles of the camera. More often this method is used for navigation rather than localization.

Another method is feature tracking.

Watching a set of features over a set of images

allows the motion of the camera to be calculated. The calculations require an accurate model of the camera system, see section 2.2. Feature tracking is dierent than optical ow because in optical ow the image is broken up into a number of blocks and then each block location is found. In feature tracking, a limited number of features are tracked from one image to one or more images. These feature locations can be calculated through a number of dierent ways, see section 3.2. In recent years scale-invariant feature transform (SIFT) has become a dominate technique. If the environment is distinctive enough, then SIFT will allow tracking of images through one or more frames. When the environment isn't very distinctive, then block matching techniques can be used.

5

1.4.2 Sensor Fusion through Gaussian Filters There are a large number of dierent kinds of lters for a variety of dierent tasks. Filters are classied as parametric or non-parametric. If a lter is non-parametric, then there is no assumption on the probability distribution and can often represent nonlinear transformations [19]. lter [20, 21].

An example of a non-parametric lter in robotics is the particle

The particle lter makes no assumption of distribution and can more

accurately represent the nonlinear equations. Particle lters rely on a set of randomly generated state samples particle, to represent the posterior. This diers from parametric lters where the posterior will be represented by a known parametric form such as a Gaussian distribution. A particle represents a hypothesis of what the true world state may be at time t [19].

During each iteration of the particle lter a number of the

particles are removed and an equal number of particles will be randomly generated. The idea behind this is to remove the particles that are least likely to represent the true state of the lter and input a new set of random particles that represent the current state better than the current set of particles. It is noted that increasing the number of particles in the lter will increase the likelihood that the probability distribution will be more accurately represented.

The problem with non-parametric lters such as the

particle lter is that they need a great deal more processing power than their parametric alternatives.

If the particles are decreased to limit the processing power needed, then

they often don't perform as well. Sometimes parametric lters are used in combination with non-parametric lters [21]. This allows the advantages of non-parametric lters to be used when needed and then a parametric lter can be used for the rest of the problem. Particle lters have gained in popularity because of the advances in computer systems and their easy implementation [22].

Parametric lters can be broken down to Gaussian and non-Gaussian based lters. Historically, Gaussian lters constitute the earliest tractable implementations of the Bayes

6

Filter for continuous spaces [19]. The Bayes algorithm calculates the belief distribution from measurement and control data. This allows for the estimation of the uncertainties involved in ones measurements.

Gaussian techniques all share the basic idea that beliefs are represented by multivariate normal distributions [19].Gaussian based lters require a normal distribution for noise, while non Gaussian lters can use other distributions. Popular examples of non-Gaussian based lters are the particle lter and histogram lter. These generally require a lot more processing power and memory than a traditional Gaussian based lter such as the Kalman Filter.

Kalman lters are one of the most extensively used lters in robotics, especially in autonomous navigation applications [23]. In 1960, R.E. Kalman published his rst paper on the Kalman lter. His work describes a recursive solution for linear discrete data. This will become the basic level of the Kalman lter which was unable to handle nonlinear equations. The Kalman lter is a recursive estimator made up of two parts, the prediction step and the update step, see Table 1.1. The prediction step involves predicting the current state,

xˆ− k,

from the previous state,

xk−1 .

The Control vector,

uk−1 ,

is a matrix

that represents some new information to help with the prediction of the current state. Matrix

A

is the state transition model and

step also involves the calculation of the state accuracy.

Pk−1

Pk− ,

B

is the control input model. The prediction

which is the error covariance, an estimate of

is the previous error covariance and

Q

is the covariance of the

process noise, which is assumed to be Gaussian.

The second step in a Kalman lter is the update step. This involves three steps. The rst step is to calculate the kalman gain

Kk .

The kalman gain is used to adjust the

current projected state in step two of the update step. It is calculated using the error

− covariance,Pk , the observation matrix,

Rk .

H,

and the covariance of the observation noise,

Then the update estimate is calculated using the kalman gain,

Kk , the current state, 7

xˆ− k,

the observation matrix,

H,

and the observation or measurement

is to update the error covariance

zk .

The nal step

Pk .

Kalman Filter Prediction Step 1. Project the state ahead:

xˆ− k = Axk−1 + Buk−1

2. Project the error covariance ahead:

Pk− = APk−1 AT + Q

Kalman Filter Update Step 1. Compute the Kalman gain:

Kk =

Pk− H T HPk− H T +R

2. Update estimate with measurement 3. Update the error covariance:

zk : xˆk = xˆ− ˆ− k + Kk (zk − H x k)

Pk = (I − Kk H)Pk−

Table 1.1: Kalman Filter

The Kalman lter has been improved in a variety of ways to address certain problems. Commonly used in the eld of robotics, the extended Kalman lter (EKF) was developed to address the issue of nonlinear equations.

The EKF is a well known and powerful

algorithm for fusing data [14, 24]. Similar to the Kalman lter, the EKF is a recursive lter composed of two main parts: the prediction step and the update step [19, 25, 26], see algorithm 1.2. Unlike the kalman lter, the state transition functions can be nonlinear, but they have to be dierentiable functions. Other changes are as follows: matrix been replaced by matrix

F,

H

has

which is the jacobian of a linearized set of equations of the

state transition model. For the update step, function the matrix

A

h is the observation prediction and

is the jacobian of the observation prediction.

8

Extended Kalman Filter Prediction Step 1. Project the state ahead:

xˆ− xk−1 , uk−1 ) k = f (ˆ

2. Project the error covariance ahead:

Pk− = Fk Pk−1 FkT + Qk−1

Extended Kalman Filter Update Step 1. Compute the Kalman gain:

Kk =

Pk− HkT Hk Pk− HkT +R

2. Update estimate with measurement 3. Update the error covariance:

zk : xˆk = xˆ− x− k + Kk (zk − h(ˆ k ))

Pk = (I − Kk Hk )Pk−

Table 1.2: Extended Kalman Filter

Another form of the Kalman lter that has been gaining popularity is the unscented Kalman lter (UKF). The UKF is an alternative to the EKF. The input and output are identical to the EKF, but it handles nonlinear equations dierently than the EKF. Instead of having to calculate the Jacobian of the state space to get a linear approximation of the equations, which can be very dicult if large nonlinear equations are involve, it uses sigma points and propagates these through the model. The resulting mean and variance of the sigma points passing through the model are then computed. The results of the UKF are often equal or better then the EKF, but at a slight cost to speed. In practice the EKF is often faster than an UKF [19].

Just like the kalman lter and EKF, the UKF also has a prediction step, see Table 1.3, and an update step, see Table 1.4. The prediction step involves the calculation of sigma point

xˆik−1

and there respecting weights

i wm

the number of augmented states. It is noted that

and

α

wci

and

, where

k

i = 0 . . . 2L

and

L

is

are tuning parameters that

determine how far the sigma are spread from the mean. Also parameter

β

can be used to

encode more information about the distribution. If the distribution is an exact Gaussian then the number 2 is the optimal value for

Pk− ,

β. The output from the prediction step, xˆ− k and

is equivalent to that of the EKF.

9

1. Calculate the

2L + 1

sigma points:

xˆ0k−1 = xˆk−1

√ xˆik−1 = xˆk−1 + γ Pk−1 √ xˆik−1 = xˆk−1 − γ Pk−1 √ γ = n+λ

where

i = 1...L

where

i = L + 1 . . . 2L

2. Propagate the sigma points through the transition function:

3. Project the state ahead:

xˆ− k =

2n P

xˆik = f (ˆ xik−1 , uk−1 )

i i xˆk wm

i=0

4. Project the error covariance ahead:

Pk− =

2n P

T xik − xˆ− xik − xˆ− wci (ˆ k )(ˆ k)

i=0

5. The weights for the state and covariance are given by:

0 wm =

wc0 =

λ L+λ λ L+λ

+ (1 − α2 + β)

i = wci = wm

λ where 2(L+λ)

i = 1 . . . 2L

λ = α2 (L + k) − L Table 1.3: Unscented Kalman Filter Prediction Step

Just like the prediction step, the update step requires the calculations of sigma point

xˆik

and uses the same weight equations as the prediction step. Also the observation function

h

is the same as in the EKF. The resulting output from the UKF is

xˆk

and

Pk ,

which is

equivalent to the EKF.

10

1. Calculate the

2L + 1

sigma points:

xˆ0k = xˆ− k p − Pk p xˆik = xˆ− Pk− k −γ xˆik = xˆ− k +γ

where

i = 1...L

where

i = L + 1 . . . 2L

2. Propagate the sigma points through the observation function

z¯ki = h(ˆ xik )

where

h:

i = 0 . . . 2L zˆk =

3. Compute the predicted measurement:

2L P

i i zk wm

where

i = 0 . . . 2L

i=0 4. Compute the predicted measurement covariance:

Pzk =

2l P

wci (¯ zki − zk )(¯ zki − zk )T

where

i = 0 . . . 2L

i=0 5. Compute the state-measurement cross-covariance matrix:

Pxk =

2l P

xik − xˆ− zki − zk )T wci (ˆ k )(¯

where

i = 0 . . . 2L

i=0

6. Compute the Kalman gain:

Kk =

Pxk Pzk

7. Update estimate with measurement: 8. Update the error covariance:

zk : xˆk = xˆ− ˆk ) k + Kk (zk − z

Pk = Pk− − Kk Pzk KkT

9. The weights for the state and covariance are given by:

0 = wm

wc0 =

λ L+λ λ L+λ

+ (1 − α2 + β)

i wm = wci =

λ where 2(L+λ)

i = 1 . . . 2L

λ = α2 (L + k) − L Table 1.4: Unscented Kalman Filter Update Step

11

1.4.3 Dead Reckoning Dead reckoning is the process of keeping track of one's position without using an instrument that determines the absolute position. Usually one starts the dead reckoning process from a known position. Then, speed and heading measurements are usually taken, and the position is calculated. This method dates back hundreds of years and has been used in a variety of applications including marine, air and automotive navigation. For example dead reckoning was the primary method of determining position when sailors, such as Christopher Columbus, crossed the Atlantic Ocean. It wasn't until the innovation of the marine chronometer in 1675 that navigation using dead reckoning became obsolete.

The major problem with dead reckoning is that each measurement depends on previous measurements. So, if an error occurs, then that error will aect the nal result until a known position is reached where both the process and error can be reset. Depending on the nature of the errors, i.e. magnitude and distribution of the noise, the position will degrade at dierent rates.

Today dead reckoning is rarely used for extended periods of time because of technologies such as GPS. GPS provides an inexpensive, yet eective way to locate one's position. The problem with GPS is that it does not work everywhere. So dead reckoning is still preferable in places where GPS doesn't work or to help augment position data from GPS readings.

A popular dead reckoning device is the inertia navigation system (INS). This device is used widely from advanced aircraft systems to navigation systems in automobiles. Typically, the INS readings are fused with other instruments to provide reliable position data in a variety of conditions [9, 11, 27].

12

1.4.4 Image Mosaicing and Optical ow Image Mosaicing is the process of overlapping 2D images to create larger images. This process can give the impression of global images covering a larger area of interest, which is actually comprised of many images.

In order to combine images, it is required to

know how these images t together. This can be achieved in a variety of ways including external information such as the movement of the camera as well as feature matching. Image Mosaicing can be very useful in visual inspections, because it gives the illusion of a continuous image that can be viewed.

Optical ow is the process of estimating motion within a set of camera images. This can be used to estimate an object moving within a camera image or estimate the motion of the camera itself. Usually a set of vectors are produced illustrating the motion within an image, see Figure 1.2.

Optical ow has become very popular as a form of motion

estimation in applications such as robotics [18, 28], and in tracking block movement to help in compressing video images [16, 29]. For now this paper will concentrate on the motion estimation.

Figure 1.2: Optical Flow Image

13

If the camera is moving, and not the objects within the cameras view, then the process of image Mosaicing and optical ow to estimate the motion are basically the same while producing dierent results. In optical ow, sets of features are found and the ow within the images is tracked. Assuming that the camera is moving and not an object in the camera's view, then image Mosaicing and optical ow have the same solution, but dierent outcomes. One's purpose is to t overlapping images together; the other is to estimate the motion within the image. To complete both tasks, features within the image must be matched. This can be performed in a variety of ways.

Two common techniques for nding features within overlapping images are block matching and SIFT. SIFT has quickly become popular within computer vision eld [3032]. SIFT allows for features to change, and is robust against changes that include image scale, noise, local geometric distortion and illumination. The problem with SIFT is that it requires uniqueness among the features, i.e. the image environment cannot look the same. The features must be distinct enough to allow for changes in the image and to prevent false matching.

Block matching on the other hand is a much more rudimentary method, but still widely used [3335]. The concept is to take small portions (the size can be varied in order to achieve better results for dierent environments) of the image from the anchor frame and look for them in the target frame. It is noted that the anchor frame is the image where the features are taken from and the target frame is the image where the features are found. This method works very well when the prospective of the camera isn't changing and the lighting remains the same. When the prospective changes it will cause the target frame to change so that the blocks from the anchor frame no longer match the target frame. This can be overcome if the prospective change little from the anchor frame to the target frame or a motion estimation is known and the blocks can be deformed to match the target frame. Block matching can be a successful alternative to SIFT when

14

the image doesn't have distinctive features, such as a wall. Otherwise SIFT is a much more robust implementation.

In order for block matching to work, the algorithm must nd a match for the block. There are a variety of dierent search techniques. The ideal search technique is an exhaustive search. This is where every possible combination is tried and the best matching result is chosen.

The problem with this method is that it requires a lot of processing time.

The alternatives are to search a subset region of the target frame and/or use dierent search algorithms that don't require every combination to be tried. Some popular search algorithms are the 3-step search and the diamond search [36, 37]. The idea of all these searches is to quickly converge on the region where the block should be, without having to try all the combinations. This generally only works well if the image is dissimilar. If the image is more or less the same, then most search techniques apart from an exhaustive search will generally converge on the wrong region.

1.4.5 Lucas-Kanade The Lucas-Kanade algorithm is an optical ow method that is very popular in computer vision [3840]. Reference [41] says the Lucas-Kanade algorithm is easily applied to a subset of points in the input image and that it has become an important sparse technique. The basic Lucas-Kanade algorithm makes 3 assumptions, as follows: the brightness is consistent, temporal persistence small movements, and spacial coherence. Spacial coherence means that similar neighboring points in the scene belong to the same surface and have a similar motion. One major problem with these assumptions is the temporal persistence. The eld of view of the camera system on the AUV is small because of the relatively short distance the camera is away from the wall. This leads to large motion vectors with small movements of the camera. The motions are often close to half a frame width when the AUV is traveling at its usual speed. In the current block matching algo-

15

rithm this isn't a problem since the INS provides enough information to choose a block in the optimal location from the anchor frame and predict the location to nd that block in the target frame. A similar system will be needed to predict the location of the block match so that the algorithm gets a close enough prediction to converge. Reference [41] describes a pyramid Lucas-Kanade algorithm that allows greater movement.

References [42, 43] describe many variations of the Lucas-Kanade algorithm. The goal of the Lucas-Kanade algorithm is to minimize the sum of squared error between two images, the template T and image I. The image I is warped onto the coordinate frame template to create a match. The warping of

I

requires interpolating the image at the sub-pixel

locations. The basic Lucas-Kanade algorithm includes nine steps outlined in Table 1.5. This is the Forward Additive class of the algorithm.

W (x; p)

1. Warp I with

to compute

2. Compute the error image 3. Warp the gradient

∇I

T (x) − I(W (x; p))

with

4. Evaluate the Jacobian

I(W (x; p))

W (x; p)

∂W at ∂p

(x; p)

5. Compute the steepest descent images 6. Compute the Hessian matrix

∂W T x [∇I ∂p ] [T (x)

7. Compute

P

8. Compute

4p = H −1

P

9. Update the parameters

∇I ∂W ∂p

H

− I(W (x; p))]

∂W T x [∇I ∂p ] [T (x)

p ← p + 4p

− I(W (x; p))]

until

k4pk ≤ 

Table 1.5: Lucas-Kanade

Other varieties of the Lucas-Kanade are also described in [42, 43]. These varieties seem to be grouped into four classes. They are classied by the quantity approximation and the warp update rules. These include the Forward Additive Image Alignment, Forward Com-

16

position Image Alignment, Inverse Composition Image Alignment, and Inverse Additive Image Alignment.

The Composition Image Alignment is an example of Forward Compositional algorithm. Instead of simply adding the additive updates to the current estimations of the parameters as is done in the Lucas-Kanade algorithm, the Forward Compositional algorithm must update the warp in the compositional algorithm.

This puts two limits on the

warps, as follows: they must contain the identity, and the set of warps must be closed under composition. The set of warps must therefore be a semi-group. These limitations are not usually a problem because most computer vision techniques conform to these requirements.

The Inverse Compositional Image Alignment is basically reformulated so that the Hessian matrix becomes a constant. This puts limitations on the warps because they have to be inverted before they can be composed with the current estimation. This is not usually a problem.

The goal of the Inverse Additive is to remove the Hessian matrix from the Forward Additive algorithm, like it was done for the inverse composition with respect to the Forward Composition algorithm.

This leads to a number of problems.

First it is far

more complex to derive and second it can only be applied to a very limited small set of warps. This includes mostly simple 2D linear warps such as translations and ane warps. The summary of each algorithm and their limitations as well as computational cost is summarized in Table 1.6 and Table1.7. Algorithm

Pre-Computation

Per Iteration

Lucas-Kanade

− O(nN ) O(n2 N ) O(k 2 N )

O(n2 N + n3 ) O(n2 N + n3 ) O(nN + n3 ) O(nN + kN + k 3 )

Compositional Inverse Composition Hager-Belhumeur Inverse Additive

Table 1.6: Computation Cost

17

Algorithm

Ecient

Can be Applied To

Forward Additive

No

Any Warp

Forward Compositional

No

Any Semi-Group

Inverse Additive

Yes

Simple Linear 2D+

Inverse Compositional

Yes

Any group

Table 1.7: Limitations on Algorithms

The Forward Additive version of the algorithm was selected because it had the fewest restrictions.

Further details of the implementation of the Lucas-Kanade algorithm is

discussed in Section 3.2.2.

1.5

Contributions

The main contributions of this research can be summarized in two major items. The rst contribution is a test platform where data can be gathered to test localization algorithms. The test platform was designed to simulate dierent motions that will be experienced by an AUV inside a water main. The simulator is placed close to a wall with no distinct features.

A cart capable of holding a variety of sensors, including a camera, INS and

laser displacement sensor is moved along a wall in dierent paths.

A set of control

boards control two servo motors, one motor controls the speed and direction of the cart along a set of tracks, while the second motor controls the angle of the platform that the tracks are attached to. This allows the cart to change angle, distance and velocity with respect to the wall, providing a large variety of dierent motion patterns that can be used for testing purposes. For further details on the simulator see section 4.1.

The second contribution is a sensor system that fuses an INS and camera together. This sensor system allows 3D movement with only a single camera and an INS. This thesis shows the short coming of a variety of dierent techniques used to try and complete

18

this task, as well as a successful implementation. The environment this sensor system has to perform is also dicult because the camera will not have any distinct features that it can use to navigate, i.e. the inside of a pipe all looks the same. For this reason the wall used in the simulator is also void of distinct features. This means that feature detection algorithms will have trouble nding meaningful information from the camera images.

Also the eld of view of the camera is small in comparison to the speeds at

which the cart will be traveling. This means that the system must perform well when features only remain in the eld of view for a couple of frames.

This means that the

same feature cannot be used for localization purposes over the course of the test, i.e. dierent features must be selected appropriately such that they will be in the next frame and as such can be used for localization purposes. Also the algorithm should have no problem being implemented in real time because a lot of eort was taken to minimize computational load and allow for parallel calculations, i.e. dierent parts of the algorithm can be computed at the same time allowing for a low lag between data input and result. It is also noted that inside real pipes there is water and this water could provide another set of challenges with bubbles and ow. The challenges of water in the camera images will not be address in this paper.

19

2 Robot Model and Camera Model In this chapter a simplied model of the underwater vehicle will be developed.

2.1

Robot Modeling

In robotics it is desirable to know the orientation and location of the robot [9, 14, 44]. In order to represent the relative position and orientation of one rigid body with respect to another, a coordinate frame must be attached to each body [45].

For example a

coordinate frame of the underwater robot is in 2.1. These coordinate frames must have a common reference frame in order to compare their relative position and orientation. This reference frame is typically a stationary coordinate frame. For the purpose of this thesis, the common reference frame will be called the world coordinate frame. Using the world coordinate frame, one can see the change in position of an object with respect to a xed coordinate frame.

Figure 2.1: Underwater Robot Coordinate Frame

20

Models are created to describe the robots interaction with the real world. Models are the mathematical representation of what is actually happening. One way of representing a model is with a set of state space variables. These variables represent a mathematical model of a physical system. It breaks the system down to a set of inputs, outputs and state variables which are related by a set of dierential equations. Often these equations are put in a matrix with the variables expressed in vector form. This breaks up the set of equations in a form which is easy to manage [24, 44, 46, 47].

The output of a model is the variables that one would like to keep track of. For a robot this can be position, orientation, velocity etc. The input into a model is the information available about the changing state of the robot, i.e. sensor data or control inputs. For example a one wheeled robot such as a unicycle is displayed in Figure 2.2. The inputs to this model will be velocity

y

v

and rate of rotation

position and the orientation

the location displayed as

x

θ.

and

y

θ˙.

The state variables will be the

x and

The output will be the same as the state variables, i.e. and the orientation

θ

. The output variables and state

variables can be dierent depending on the requirements of the model. The equations for this model are 2.1. Using these equations, the orientation and position can be calculated with a given set of velocity and rate of rotation measurements. Since most robots are a lot more complicated than a simple one wheeled machine, more complex models are usually needed.

21

Figure 2.2: Unicycle







 xt   1 0 0     y = 0 1 0  t      θt 0 0 1









  xt−1   y   t−1  θt−1

    t · sin(θt−1 ) 0     vt   +  t · cos(θ ) 0    t−1       θ˙t 0 t

(2.1)

To help build mathematical models, it is important to know how to describe rotation and translation changes. To describe a rotation and/or translation, a transformation matrix is used.

A common method for specifying a rotation is in terms of three independent

quantities called Euler angles [45]. Pitch, Equation 2.3. Roll,

ψ,

φ,

θ,

describes the rotation about the y axis, see

describes the rotation in the x axis, see Equation 2.2 and yaw,

describes the rotation about the z axis, see Equation 2.4. To describe a translation

Equation 2.5 is used. This describes a position change in three dimensional space.

22





0 0 0   1    0 cosφ −sinφ 0    Rx (φ) =    0 sinφ cosφ 0      0 0 0 1



 cosθ   0  Ry (θ) =   −sinθ   0

(2.2)



0 sinθ 0   1 0 0    0 cosθ 0    0 0 1



(2.3)



 cosψ −sinψ 0 0     sinψ cosψ 0 0    Rz (ψ) =    0 0 1 0      0 0 0 1



(2.4)



 1 0 0 X     0 1 0 Y    T =   0 0 1 Z      0 0 0 1

(2.5)

It is often desirable to be able to translate between dierent coordinate frames. This can be used to describe the movement from one coordinate frame to another, for example where a vehicle position changed and its desirable to know what that movement is with respect to the world coordinate frame.

This can be accomplished by multiplying the

vector with a combination of rotation and translation matrices. The resulting 4x4 matrix is called a transformation matrix and incorporates both translation and rotation.

To

perform more than one rotation and/or translation, the rotation and translation matrix

23

can be cascaded using one of two dierent methods.

These methods are called post-

multiply and pre-multiply.

Post-Multiply is when the rotations and translations are preformed on an evolving reference frame.

This means that if a rotation about the x axis is performed, then in

post-multiply the action will be performed on the current x axis and not the original, in the event that other rotations had been performed. In pre-multiply, the rotations and translations are performed on a xed reference frame, i.e. the rotations and translations are performed on the original reference frame.

When a combination of rotations and translations are performed, the resulting matrix will represent the transformation from one coordinate frame to the next. For example, see Figure 2.3. This gure shows two dierent coordinate frames. The rst coordinate frame is a pure translation from the original. This leads to the origin of the axis being in a dierent location.

The second coordinate frame is a pure rotation from the last,

which is a 90 degree rotation around the Z axis. Equation 2.6 shows the combination of rotation and translation matrix if performing a post-multiply for the conversion from the original coordinates to the nal coordinate systems. Any number of translation and rotation matrix can be linked together to obtain the desired matrix that describes the change from two dierent coordinate frames.

24

Figure 2.3: Convert Coordinate Frames





cos π2

 1 0 0 X       sin π 0 1 0 Y π   2 Q = T Rz ( ) =    0 0 1 Z  0 2     0 0 0 1 0

2.2

−sin π2 cos π2 0 0



0 0   0 0    1 0    0 1

(2.6)

Camera Modeling

Cameras have become popular in robotics as a sensor [14, 1618, 40, 43]. A camera model is required to use a camera as a sensor. The model describes the location or change in objects with reference to the position of the camera, i.e.

whenever a camera moves,

the objects its viewing will also change. The movement of the objects can be described mathematically with the motion of the camera. For instance, if a camera in panning left, then the objects the camera sees will also pan and eventually exit the eld of view. In order to use the camera as a navigation sensor, a model describing this relationship has to be created.

25

The most simple of camera models is the pin hole model, illustrated in Figure 2.4. A pin hole camera is a camera without a lens and a single small aperture. Cameras with lens can be modeled using a pin hole camera model. The pin hole camera model mathematically describes the relationship between the image plane and a point in the world, i.e.

it

describes the relationship of an object located in a three dimensional world and its location on a two dimension plane.

It is a popular model to use in computer vision

[48].

Figure 2.4: Pin hole Camera Model

The camera model in Figure 2.4 causes the image on the image plane to be inverted. The non-inverting pinhole camera model, illustrated in Figure 2.5, can be used instead of the inverting pinhole camera model if the inverted calculations are not desirable. A set of equations can be derived using the non-inverting camera model, see Equation 2.7.

26

Figure 2.5: Non-inverted Camera Model

x y r f = = = X Y R Z

X, Y, Z

(2.7)

is the position of the feature with respect to the camera

x, y

is the position of the feature within the image

f

is the focal length of the camera

From this the perspective projection equations can be derived.

x=

f X Z

(2.8)

y=

f Y Z

(2.9)

From these equations the inverse prospective equations can be formulated.

27

X=

Z x f

(2.10)

Y =

Z y f

(2.11)

Unfortunately both the direct and inverse perspective equations have nonlinear relationship with coordinates

x

and

y

that are dependent on

f

and

Z.

This leads to the

assumption that the distance between two points in the camera image are much smaller than the average distance z. This leads to Equations 2.12 and 2.13.

x≈

f X Z

(2.12)

y≈

f Y Z

(2.13)

The camera model can be integrated into a robot model through a base frame. Similar to a robot model, the base coordinate frame allows dierent coordinate frames to have a common point of reference. The camera model can use a base frame to relate more than one camera together which allows for depth measurements.

This is called stereo

vision [49, 50]. A base frame can also be used to relate the camera to the robot model. In this thesis the camera model will be used for relating the camera to the robot model. Detailed information on how this was implemented is in section 3.1.

28

3 Localization Based on Fusion of the INS and Vision System 3.1

Camera Model

The camera model is a set of equations that describe the relationship between the feature movement in the camera's images and the robot's movement. These models allow the camera to be used as a motion sensor when tracking a set of features.

The camera model describes the movement of the camera from

k

to

k + 1.

The model

also relates the coordinate frame of the camera to the robot coordinate frame and then to the world coordinate frame. A diagram of this is shown in Figure 3.1.

Figure 3.1: Coordinate Frame Conversion

29

The coordinate transformation from the world coordinate frame to the robot coordinate frame at time

k

is given by Equation 3.1.

Q1 = QT 1 QRz1 QRy1 QRx1

QT 1 is

the translation matrix and

QRz1

,

QRy1

and

QRx1

(3.1)

are the rotation matrix.

The

Equations for these are 3.2, 3.3, 3.4 and 3.5.



QT 1

x k , yk

and

zk



 1 0 0 xk     0 1 0 yk    =   0 0 1 z   k    0 0 0 1

(3.2)

represents the location of the robot in the world coordinate frame at time

k.



QRz1

ψk

 cosψk −sinψk   sinψk cosψk  =  0 0   0 0



0 0   0 0    1 0    0 1

(3.3)

is the yaw orientation of the robot with respect to the world coordinate frame at time

k.

30



QRy1

θk

 cosθk   0  =  −sinθ  k  0



0 sinθk 0   1 0 0    0 cosθk 0    0 0 1

(3.4)

is the pitch orientation of the robot with respect to the world coordinate frame at time

k.



QRx1

φk

0 0  1   0 cosφk −sinφk  =  0 sinφ cosφk  k  0 0 0



0   0    0    1

(3.5)

is the roll orientation of the robot with respect to the world coordinate frame at time

k.

The coordinate transformation from the robot to the camera at time

k

is represented by

Equation 3.6.

Q2 = QRx2 QT 2

QT 2 ,

(3.6)

see Equation 3.7, represents the oset of the camera coordinate frame with respect

to the robot coordinate frame in millimeters. While

QRx2 ,

see Equation 3.8, represents

the rotation about the x axis for the camera coordinate frame with respect to the robot coordinate frame.

31



QT 2



 1 0 0 −15     0 1 0 −12    =   0 0 1 −70      0 0 0 1



QRx2

0 0  1     0 cos −π −sin −π  2 2 =  0 sin −π  cos −π   2 2  0 0 0

(3.7)



0   0    0    1

(3.8)

The coordinate conversion from the world coordinate frame to the robot coordinate frame at time

k+1

is given by Equation 3.9.

Q3 = QT 3 QRz3 QRy3 QRx3

QT 3

is the translation matrix and

QRz3

,

QRy3

and

QRx3

(3.9)

are the rotation matrix. The

Equations for these are 3.10, 3.11, 3.12 and 3.13.



QT 3

xk+1 , yk+1

and

zk+1

 1 0 0 xk+1   0 1 0 yk+1  =  0 0 1 z  k+1  0 0 0 1

        

(3.10)

represents the location of the robot in the world coordinate frame at time

k + 1.

32



QRz3

ψk+1

 cosψk+1 −sinψk+1   sinψk+1 cosψk+1  =  0 0   0 0

0 0   0 0    1 0    0 1

(3.11)

is the yaw orientation of the robot with respect to the world coordinate frame at time



QRy3

θk+1



k + 1.

 cosθk+1   0  =  −sinθ  k+1  0



0 sinθk+1 0   1 0 0    0 cosθk+1 0    0 0 1

(3.12)

is the pitch orientation of the robot with respect to the world coordinate frame at time

k + 1.



QRx3

φk+1

0 0  1   0 cosφk+1 −sinφk+1  =  0 sinφ cosφk+1  k+1  0 0 0



0   0    0    1

(3.13)

is the roll orientation of the robot with respect to the world coordinate frame at time

k + 1.

The coordinate transformation from the robot to the camera at time noted that the

Q4

is equal to

Q2

k+1

is

Q4 .

It is

because the camera is xed to the robot and cannot

move independent from the robot.

33

Q4 = Q2

(3.14)

k

The coordinate transformation from camera at

to

k+1

Q5

the motion of the camera between two images.

is

Q5 ,

3.15.

This describes

must be calculated through the

relationship of the movement of the features between camera images. Before calculated, a set of features need to be found in the image at time

k.

Q5

can be

A variety of dierent

methods can be used to detect features in the camera images. Further discussion of this is described in Section 3.2. For now it is only important that these features can be found.

Once the features have been identied in an image at time

k,

they must also be found

in the next image. The relationship of the position of the features with respect to the camera position is described in Equation 3.16.



 Q511 Q512   Q5 Q522 21  −1 −1 Q5 = Q2 Q1 Q3 Q2 =   Q  531 Q532  Q541 Q542 

  m X  it+1       Ym    it+1     = Q5   Zm    it+1      1 Ximk , Yim k

Ximk+1 , Yim k+1

and

and

Zimk

is the distance the object

Zimk+1 is

the distance the object

 Q513 Q514   Q523 Q524    Q533 Q534    Q543 Q544

 Ximt    Yim  t  Zimt    1

i

is away from the camera at time

i

is away from the camera at time

(3.15)

(3.16)

k

k + 1.

34

By combining the non-inverting pinhole camera model, see Equation 2.7, and Equation 3.16, Equations 3.17 and 3.18 can be derived. These equations describe the relationship between the camera movement and the movement of the features

xik+1 =

yik+1 =

xik+1

and

yik+1

f Zimk+1

f Zimk+1

Ximk+1 = f

+ Q513 Zimk + Q514 Q511 Ximk + Q512 Yim k + Q533 Zimk + Q534 Q531 Ximk + Q532 Yim k

(3.17)

=f Yim k+1

Q521 Ximk + Q522 Yim + Q523 Zimk + Q524 k Q531 Ximk + Q532 Yim + Q533 Zimk + Q534 k

(3.18)

is the location of feature

i

in the camera image at time

minimum of six features allows the movement

3.2

i in the camera images.

Q5

k.

Given a

to be calculated.

Feature Matching

There are many ways for nding and tracking features in a set of camera images. This chapter will describe the process of using block nding and the Lucas-Kanade algorithm for nding and tracking features from images captured during a 3D motion. The chapter will continue with how to select the features and process them using image registration. The nal topic in the chapter will be a simple data fusion using the EKF to fuse the data from the camera with the data from the INS.

3.2.1 Block Find Block nding is a 2D grid search that takes the block from the anchor frame (the previous frame) and searches for it in the target frame (the current frame), see Figure 3.2. The inputs

Xstart and Y start are the location where the search begins looking for the block in 35

the target frame. The input

search describes the size of the search window, i.e.

that the block will be searched for from the

x

and

output

y

Xstart

and

Y start

the range

locations. The outputs

is the location of the best match for the search in the specied window. The

result

is the sum of the dierences between the block and the match contained

in the current frame, i.e. the target frame. It is noted that the block is just a block of pixels that was obtained from the anchor frame.

[x, y, result] = blockf ind(image1, block, Xstart, Y start, search)

searchwindow : [∀(x, y),

Xstart −

search 2

≤ x ≤ Xstart +

search 2

Y start −

search 2

≤ y ≤ Y start +

search 2

result =

where

image1ij

X

]

|image1ij − block|

is the best block match from the current frame

This is the original method used to nd features in the images. The INS predicts the location of the block and the search algorithm will search in the vicinity of the estimated location. The vicinity is described as a predetermined search range. This method works well when dealing with 2D movements without a rapid change in the direction the robot is moving, i.e. when the camera will move in the

X

and

Y

directions and the camera

orientation with respect to the wall remains the same. A number of dierent fast search algorithms were examined including the diamond search [36] and three step search [37]. It was discovered that these fast search algorithms are not eective because the images are so similar that the algorithm cannot converge to nd the correct location. Normally this is an issue if the entire image has to be searched, but since the INS narrows down the search domain, the computational load of the search algorithm is not demanding.

36

Figure 3.2: Block Matching

Another improvement that was made to the block search algorithm with the use of the INS is that the block can be taken from an optimal location within the anchor frame. Originally the block was always taken from the center of the anchor frame and searched for in the target frame, but when the INS is used, there is an estimate of the motion and the block can be taken from a location that allows for the maximum movement. If this is not done, then the maximum movement the camera can experience when nding a single block taken from the center of the anchor frame is a little less than half of the eld of view, but when the block is chosen from an optimal location, the movement can almost be the entire eld of view. This basically doubles the movement allowed per frame. The benet of this also continues when more than one block is searched for in a single image. For example when dealing with 3D motions, a total of ve blocks are used to increase the number of features. Under these circumstances, selecting the optimal locations for the blocks from the anchor frame will still increase the allowed motion.

This allows

for quicker motions without having to increase the eld of view (FOV) and lower your eective accuracy, i.e. when the FOV is increased, the size of each pixel in the image also increases. Therefore the accuracy of the camera is decreased when the FOV is increased, given that the camera maintains the same resolution.

37

3.2.2 Lucas-Kanade Similar to the block matching of a 2D motion, a 3D motion will also require an estimate for optimal performance.

The main dierence is that when the camera moves in the

3D space, the block being searched for can change, i.e. the block will change shape as well as location. This changing shape must be accounted for to minimize the number of iterations of the search algorithm and help prevent the algorithm from diverging. The Ane Warp is a

2×3

matrix that describes the shape and location change of a block

between two camera images. It is one of the inputs into the Lucas-Kanade algorithm, explained in Section 1.4.5. To calculate an estimate for the ane warp, an estimate for the motion of the camera between frames is required. Before the camera motion can be calculated, an estimate of the robot motion is required, see Equation 3.19. Equation 3.19 is the estimated displacement and orientation change of the robot from the anchor frame to the target frame. This is calculated using the change in the state space variables, i.e. location and orientation of the anchor frame, time of the target frame, time

k.

k + 1,

minus location and orientation

See Equations 3.20 to 3.25.





 Xest Yest Zest  motionest =   θest ψest φest

(3.19)

Xest = xk+1 − xk

(3.20)

Yest = yk+1 − yk

(3.21)

Zest = zk+1 − zk

(3.22)

38

θest = θk+1 − θk

(3.23)

ψest = ψk+1 − ψk

(3.24)

φest = φk+1 − φk

(3.25)

By substituting these values of the motion estimation into matrix tions 3.1 and 3.6 ,

Q5

can be calculated, see Equation 3.15.

Q1 Q5

and

Q2 ,

see Equa-

now describes the

estimated camera motions between the anchor frame and target frame.

The initial warp matrix

p

is calculated by using Equations 3.26, 3.28 and 3.29.





 a11 a12 a13  p=  a21 a22 a23



 Xp1 k+1 Xp2 k+1 Xp3 k+1   Yp k+1 Yp k+1 Yp k+1 2 3  1   Z  p1 k+1 Zp2 k+1 Zp3 k+1  1 1 1 X, Y

and

Z



 Xp 1 k Xp 2 k Xp 3 k     Yp k Yp k Yp k  2 3  1   = Q5 ∗    Z   p1 k Zp2 k Zp3 k   1 1 1

        

(3.27)

are the location of points 1, 2 and 3 with respect to the camera. The location

of the points at time

Q5 ,



(3.26)

k+1

are calculated by using the rotation and translation matrix

see Equation 3.27.

39



  a  11   Xp1 k Yp1 k     a12   Xp k Yp k 2 2     =  a   X  13   p3 k Yp3 k    0 0 1



 a21   a22    a  23  1





  Xp 1 k Yp 1 k     Xp k Yp k 2 2   =   X   p 3 k Yp 3 k   0 0

f

Zp1 k f Zp2 k f Zp3 k f

0

−1  0   Xp1 k+1    0    Xp2 k+1    0    Xp3 k+1   1 1

Zp1 k f Zp2 k f Zp3 k f

0

−1 

0   0    0    1

 Yp1 k+1   Yp k+1  2   Y  p3 k+1  1

        

(3.28)

        

(3.29)

is the focal length of the camera

The estimated ane warp is sent to the Lucas-Kanade algorithm. The Lucas-Kanade algorithm used is the Forward additive version, see Section 1.4.5. This algorithm makes the least number of assumptions and therefore requires more calculations. In the future another version of the algorithm can be experimented on to increase the eciency. All versions of the Lucas-Kanade attempt to minimize the error of the block match between the anchor frame and the target frame. Essentially the Lucas-Kanade algorithm warps the block until it ts the target frame. If the initial estimate is wrong, then the algorithm can diverge and incorrectly move/warp the block.

The algorithm is very sensitive to

movement and requires a good estimate on the location of the block.

This is why an

initial block search is used.

The output of the Lucas-Kanade algorithm is the

p

matrix and rms error.

If the rms

error is greater than 4, it is assumed that the match is not good and hence the current target frame is not used. This means that the data will not be passed on to the image registration and the EKF will not be updated until the next frame.

40

3.2.3 Selecting Feature Points Once the Image is aligned from the ane image alignment, the

p matrix is obtained from

the Lucas-Kanade algorithm. The position of x and y pixel movement can be calculated by,

xpix = astartX − P1,3 − 1

ypix = astartY − P2,3 − 1

The features selected are the four corners of the ve blocks. points spread throughout the image.

This provides 20 feature

In the anchor frame the 64x64 pixel blocks are

chosen from the optimal position to best represent the expected movement, see Section 3.2.1. For all experiments in this thesis the blocks had a separation of 100 pixels in the anchor frame. In the future the separation could be a variable that is tuned using the estimated motion. This will allow for faster motions, by decreasing the separations and increasing the allowed movement per frame, or increasing the sensitivity of rotation by increasing the separations. By increasing the separations, the features at the edges will change at a greater rate than the center features when rotations occur.

The anchor points,

astartX block.

and

appoints,

astartY

blocksizeX

and

are the

are selected by taking the four corners of each block.

x

blocksizeY

and

y

positions of the top left corner of the current

are the

x and y

size of the blocks. For all experiments

in this research, the block size was 64 pixels for both

x

and

y.

41





astartX  astartX appoints =  astartY blocksizeY + astartY

blocksizeX + astartX blocksizeX + astartX   blocksizeY + astartY astartY (3.30)

The target points are calculated by selecting the four corners of each target block match. Taking these points from the anchor block and multiplying them by 3.32, the warped points can be calculated.

M

M,

is calculated using the

see Equation

p

matrix from

the output of the Lucas-Kanade algorithm. From here the target frame features can be extracted, see Equation 3.34.

 P1,2 P1,3  P1,1 + 1  M = P2,2 + 1 P2,3  P2,1  0 0 1

     

(3.31)





 appoints1,1 appoints1,2 appoints1,3 appoints1,4  warp_points = M ∗   appoints2,1 appoints2,2 appoints2,3 appoints2,4  1 1 1 1



astartX

astartY

   astartX + blocksizeX astartY  image1_points =   astartX astartY + blocksizeY   astartX + blocksizeX astartY + blocksizeY

    

(3.32)

        

(3.33)

42



 warp_points1,1   warp_points1,2  image2_points =   warp_points  1,3  warp_points1,4

3.3



warp_points2,1   warp_points2,2    warp_points2,3    warp_points2,4

(3.34)

Image Registration

Image registration is an iterative process based on feature matching between a model of the scene and points from it's image. It searches for a transformation

Qdata/model

that

makes the data features match the model features within a given error. The model used for this experiment is derived from the rst image. In order for this to work a number of assumptions have to be made. First, the location of the selected points in reference to the camera frame must be known. For this application the camera is taking pictures of a wall. The wall is assumed to be at and the starting orientation of the wall with respect to the vehicle must be known. For this research the vehicle always starts parallel to the wall. For a pipe the assumption that the wall is at isn't valid, but the general surface of the pipe wall is still known and the curved surface geometry has to be taken into account. If this cannot be done, then another sensor, such as a rangender is required to scan the surface of the pipe to obtain a range measurement for the wall. If the surface is relatively at or uniform, then a set of low density range measurements will suce. If the surface is complex with changing depths, then a high density scan will be needed.

43

Figure 3.3: Image Registration

The Lucas-Kanade algorithm is used to match groups of points from the two images, allowing the algorithm to calculate the change in the camera position between the images. The dierence between the points are caused by the dierence of the camera position between these images, see Equation 3.35 and 3.36. The change in camera position can be estimated using Equations 3.37 and 3.38. Equations 3.37 and 3.38 require a minimum of three matching points for it to work, but 20 points were used in all the experiments. A least squares method was used to calculate4T and

4R

because more than 3 matching

points were used to minimize the error caused by noise.

4T

and

4R

to update the translation and rotation values for the next iteration.

are then used

This is repeated

until 5 iterations have passed. The nal rotation and translation provided from image registration are directly related to the camera's movement and are fused with the INS data using an EKF.

δxi = xi (R0 , T 0 ) − xactual i

(3.35)

δyi = yi (R0 , T 0 ) − yiactual

(3.36)

44

δxi ≈

δyi ≈

f f Xi f Xi Yi f (Xi2 + Zi2 ) f Yi 4T1 − 2 4T3 − 4R + 4R2 − 4R3 1 2 2 Zi Zi Zi Zi Zi

(3.37)

f Yi f (Yi2 + Zi2 ) f (Xi Yi ) f Xi f 4T2 − 2 4T3 − 4R1 + 4R2 + 4R3 2 2 Zi Zi Zi Zi Zi

(3.38)





 4X     4T =  4Y     4Z 

(3.39)



 4θ     4R =  4φ     4ψ

(3.40)

The major problem with using image registration was providing a condence on how accurate these results are. There are two places that will aect the results of the image registration. The rst location is the feature matching. If the Lucas-Kanade algorithm fails to provide a correct match of the block, then the change in features will have little meaning and the results obtained from the image registration will have no meaning. For this reason a threshold was used that will dictate whether the feature match is successful or not.

The second place that errors can come from is the image registration.

If the

motion estimation is inadequate then the algorithm may diverge and provide an incorrect motion. To examine the eects on both of these issues, a set of simulated experiment tests was created, see Section 4.4.

45

3.4

EKF Technique

The EKF is used to fuse the camera data and the INS together. The state variables for the EKF are shown in Equation 3.41.

x, y, z





            Xk =             

xk   vxk     yk    vyk    zk     vzk    θk    φk    ψk

(3.41)

= displacement traveled in the world coordinate

vx , vy , vz

= current velocity in the world coordinate

θ

= pitch,

φ

= roll,

ψ

= yaw

The prediction step is essentially dead reckoning with the INS sensor.

Equation 3.42

describes the relationship of updating the state variables during the prediction step, i.e. projecting the state ahead, see Section 1.4.2.

ˆ k+1|k = f (X ˆ k|k , Uk , k) X

(3.42)

46





            ˆ k|k , Uk , k) =  f (X            

xk + T vx   vx + T A1     y + T vy    vy + T A2    z + T vz     vz + T A3    ˙ θ + T θins    φ + T φ˙ ins    ˙ ψ + T ψins

(3.43)

A1 = (Ax cosθcosψ + Ay (sinθsinφcosψ − cosφsinψ) + Az (sinφsinψ + sinθcosφcosψ))

A2 = (Ax cosθsinψ + Ay (sinθsinφsinψ + cosφcosψ) + Az (sinθcosφsinψ − sinφcosψ))

A3 = (−Ax sinθ + Ay cosθsinφ + Az cosθcosφ − Ag )

Ax , Ay , Az

Ag

= measured acceleration from the INS with respect to the vehicle

= acceleration due to gravity with respect to the world coordinate system

θ˙ins

T

= angular velocity of pitch measured by the INS

φ˙ ins

= angular velocity of roll measured by the INS

ψ˙ ins

= angular velocity of yaw measured by the INS

= is the time interval of the measurements coming from the INS

47

Equation 3.44 describes the relationship to calculate the error covariance, i.e. the projecting error covariance ahead.

Pk+1|k = Fk Pk|k FkT + Lk Qk LTk

(3.44)



 1 T 0 0 0 0

            ∂f = Fk =  ∂X ˆ  X=Xk|k          

0

0

0 1 0 0 0 0 F2,7 F2,8 0 0 1 T 0 0

0

0

0 0 0 1 0 0 F4,7 F4,8 0 0 0 0 1 T

0

0

0 0 0 0 0 1 F6,7 F6,8 0 0 0 0 0 0

1

0

0 0 0 0 0 0

0

1

0 0 0 0 0 0

0

0

0   F2,9     0    F4,9    0     0    0    0    1

(3.45)

F2,7 = T (−Ax sinθcosψ + Ay cosθsinφcosψ + Az cosθcosφcosψ)

F2,8 = T (Ay (sinθcosφcosψ + sinφsinψ) + Az (cosφsinψ − sinθsinψcosψ))

F2,9 = T (−Ax cosθsinψ+Ay (−sinθsinφsinψ−cosφcosψ)+Az (sinφcosψ−sinθcosφsinψ))

F4,7 = T (−Ax sinθsinψ + Ay cosθsinφsinψ + Az (cosθcosφsinψ − sinφcosψ))

F4,8 = T (Ay (sinθcosφsinψ − sinφcosψ) + Az (−sinθsinφsinψ − cosφcosψ))

F4,9 = T (Ax cosθcosψ + Ay (sinθsinφcosψ − cosφsinψ) + Az (sinθcosφcosψ + sinφsinψ)) 48

F6,7 = T (−Ax cosθ − Ay sinθsinφ − Az sinθcosφ)

F6,8 = T (Ay cosθcosφ − Az cosθsinφ)





0 0 0    cosθcosψ sinθsinφcosψ − cosφcosψ sinφsinψ + sinθcosφcosψ     0 0 0    cosθsinψ sinθsinφsinψ + cosφcosψ sinθcosφsinψ − sinφcosψ   L=T 0 0 0    cosθsinφ cosθcosφ  −sinθ    0 0 0    0 0 0   0 0 0

0 0 0   0 0 0     0 0 0    0 0 0    0 0 0     0 0 0    1 0 0    0 1 0    0 0 1 (3.46)

The second part of the EKF is the update step. The update Equations are 3.47, 3.48, 3.50 and 3.51. The purpose of the update step is to correct the error build up from the INS using data from the cameras. Equation 3.47 describes the update estimate using the measurement data, this is the equation that updates the state variables.

ˆ k+1|k+1 = X ˆ k+1|k + Kk+1 Y˜k+1 X

(3.47)

ˆ k+1|k ) Y˜k+1 = zcamera − h(X

(3.48)

49

        h(Xk ) =        







4X   Xk+1|k + (Tcam − Tkal )Vxk+1|k − Xanchor    4Y    Yk+1|k + (Tcam − Tkal )Vyk+1|k − Yanchor     4Z   Zk+1|k − +(Tcam − Tkal )Vzk+1|k Zanchor =    4θ  θk+1|k − θanchor      4φ  φk+1|k − φanchor     4ψ ψk+1|k − ψanchor

4X, 4Y, 4Z

= the movement in the

4θ, 4φ, 4ψ

= the rotation about the

X, Y, Z

              

(3.49)

from the anchor frame to the target frame

X, Y, Z

axis from the anchor frame to the target

frame

Tcam − Tkal

= the times passed since the last prediction step of the kalman lter

Equation 3.50 describes the update of the error covariance.

Pk|k = (I − Kk+1 Hk+1 )Pk+1|k

Kk+1 =

T Pk+1|k Hk+1 T Hk+1 Pk+1|k Hk+1 + Rk+1

(3.50)

(3.51)

50





1 0 0    (T 0 0  cam − Tkal )    0 1 0    0 (Tcam − Tkal ) 0   H= 0 0 1    0 0 (Tcam − Tkal )     0 0 0    0 0 0   0 0 0

0 0 0   0 0 0     0 0 0    0 0 0    0 0 0     0 0 0    1 0 0    0 1 0    0 0 1

(3.52)

Originally the EKF was to include the image registration equations so that the observation matrix,

h(Xk )

will include the feature points and not the change in displacement

and orientation. The problem with this is that it complicated the observation matrix. This became a major issue because the jacobine of the observation matrix,

H

had to be

calculated. The complexity of the model meant that it was not possible to code by hand and the mathematical package Mupad hit a software limit of 25,000 characters when trying to write the code. Attempts were made to break up the code such that Mupad could then transfer the equations to code, but this was not successful. Attempts were also made to simplify the model making the image registration more local and then converting the results to the world coordinate frame from within the EKF. This led to other complications and as such the EKF model illustrated above was used. By removing the image registration out of the EKF it simplied the observation matrix, but also made it dicult to assign a value to the covariance of the sensor input, i.e.

the change in

displacement and orientation. In future work this issue could be revisited and a dierent ltering algorithm can be used, such as a UKF or particle lter. This will bypass the need to calculate the jacobine of the observation matrix and as such a more complicated

51

model which includes the image registration can be used.

52

4 Experimental Setup and Results 4.1

Simulator

The simulator was designed to represent the movement of a pipe inspection robot inside a water main.

Figure 4.1 illustrates the dimensions of the simulator and displays the

movement of both the cart and platform. This movement provides motion characteristics that will be experienced by free swimming motion of a pipe inspection robot.

The

platform can clearly be seen in Figure 4.2.

Figure 4.1: Schematic of the Simulator

The mechanical structure of the simulator consists of a main platform that holds a dualrail track system for the cart. Two tall pillars, one at each end of the platform, support the track. These pillars can be adjusted to change the track length by sliding them in a set of slots cut into the platform. The two ends of the track can move along a slot located in the middle of each pillar so that the inclination of the track can vary. By adjusting

53

the pillars and the track, a variety of dierent lengths and angles can be achieved with very little eort. The track consists of two parallel rails that extend between the two tall pillars. This track is a low friction Ignus rail that keeps the cart in place using a set of Teon-coated guides that provide a smooth sliding motion for the cart.

Figure 4.2: Picture of the Simulator

The cart is large enough to include a number of sensors including a camera, INS and laser rangender.

The position and orientation of the cart is controlled with a set of

JR Kerr servo control boards that provide multi-axis movement. The simulator consists of a linear belt-driven actuator and a rotation mechanism that are controlled using the JR Kerr motion boards. Another set of boards provide both digital and analog IO to communicate with a variety of sensors. Both the cart and platform contain limit switches that allow the device to auto level itself and adapt to any changes made by the user to the length and/or angle of the track.

The cart is moved along the track using a linear belt-driven actuator.

The belt goes

around two idle pulleys that provide a simple tension system for adjusting the tension of the belt when the inclination and hence the length of the track changes.

For this

54

reason, two shorter pillars are positioned between the taller pillars. holds one of the idle pulleys.

Each short pillar

The pillars are mounted on the main platform that can

rotate about two steel shafts, which are aligned along the center axis. rotates about the center axis using a pulley system.

The platform

As a result of the three dierent

motions mentioned above, the cart can move along the inclined rail and swing, simulating a variety of dierent motion patterns.

The linear motion along the track and the rotation of the platform about the central axis are provided by two servomotors. The servomotors are controlled using the multi-axis motion controller in real time. The controller is modular, consisting of several independent boards that communicate through an RS485 network. The network is terminated by an RS232 card that is connected to a PC and controls all other boards in the network. The network can support up to 32 modules, but it currently consists of ve boards including two servo controllers, two I/O boards each with 24 digital channels and six 8-bit resolution analog channels, and an RS232 board. The control board has it's own timer that allows time synchronization for all boards that are connected to it.

The software of the simulator is divided into a Human Machine Interface (HMI), motion control, and data analysis. The HMI allows the user to have a wide range of simulation patterns, by changing the velocity, acceleration, and number of incremental movements along the track as well as the starting angle, angular velocity, and acceleration of the rotating platform. Since the length and angle of the track can vary, the program determines the track length and calibrates the simulator automatically before every simulation. This is accomplished by moving the cart along the track at a slow speed until a limit switch on the cart hits the pillars at each end of the track. Then, the software determines the position of the cart in the desired starting location. The next step is to level the main platform.

The simulator uses the on/o signal of a mercury switch to level the main

platform horizontally.

Once the platform is leveled, the platform rotates to a desired

55

starting angle and the experiment starts. The motion control software simultaneously records the position data measured by the optical encoders on the servomotors as well as the localization and visualization sensors. The data is recorded into a computer le for post-processing. Matlab scripts are used to import and process this data. The program uses a Denavit-Hartenburg (DH) model to transform all recorded data into a common coordinate frame so that positions obtained from the simulator can be compared to those obtained from the localization sensors.

4.2

Sensors

The localization system consists of an INS and a digital camera.

The INS is capable of tracking the linear motion and rotation of an object with a very low processing power. This is very important in, for example, an AUV or a pipe inspection robot that runs on limited battery power. The INS is a MicroStrain 3DM-GX1 which utilizes a tri-axial DC accelerometer, a tri-axial magnetometer and a tri-axial gyroscope [51]. It uses an embedded microprocessor to fuse the data from the sensors to calculate the static and dynamic orientations.

It is noted that the magnetometers will not be

functional inside a pipe; hence, the use of an INS with gyroscopes is essential for the measurement of changes in orientation due to the rotation of the pipe inspection robot. For this reason the static measurements for angles are neglected and the angle rates are used instead. This represents the data coming from the gyros instead of a combination of the gyro and magnetometers. The accuracy of the gyros is reported as

The accelerometers track accelerations along the 0.0981

m/s2 .

x, y

and

z

0.7

degrees/sec.

axes with an accuracy of

Theoretically, the velocity as well as the travel distance (and hence the

new position of the sensor) can be computed by nding the integral of the measured accelerations with respect to time.

However, due to the well-known drift problem of

56

the INS, a small error in acceleration can accumulate over time and corrupt the results. Therefore, the INS cannot be considered a reliable means for the measurement of position over an extended period of time.

The behavior of the INS used was investigated through several experiments. Figure 4.3 shows 20 experiments with the INS at rest. The measurements of the INS were recorded for about 2 seconds for each experiment. Each of the 20 experiments yielded completely dierent results leading to unpredictable sensor behavior. Nevertheless, with an update rate of 12 milliseconds, the INS can be used to determine changes in position over very short periods of time and should make an excellent prediction of the movement of the camera [13].

Figure 4.3: INS Stationary Characteristics

The camera is a Pixelink PL-A741 with a wide angle lens. The camera is used to take pictures of the featured wall, which represents the inside surface of a pipe and is often void of any easily-matched features.

Figure 4.4 shows two pictures, one that is taken

when the camera is close to the wall and another when the camera is far from the wall. This illustrates the kind of pictures that the algorithm has to deal with.

57

Figure 4.4: Image Examples

The Pixelink PL-A741 is a Firewire (IEEE 1394) monochrome camera with a maximum resolution of 1280 x 1024. This camera is fully congurable, including the resolution. For most tests, a resolution of 496 x 496 was used because it provided a sucient number of frames per second when the camera tests were being performed close to the wall. By having a high number of frames per second, this allowed for quick movements of the camera when it was close to the wall. It is noted that during fast motions, objects in the FOV can move almost the full camera view and as such don't remain in more than two camera images.

An optional laser rangender provides a distance measurement of the camera and the wall.

Since the laser rangender has a limited eective range, it was only usable in

tests where the camera was close to the wall, i.e.

the high speed tests.

For the long

three-dimensional tests the laser was not used because it was out of range.

4.3

Result Categories

The results are broken up into three categories. This includes Simulated Motions, Section 4.4, Computer Generated Results, Section 4.6, and Simulator Experiments, Section 4.7.

58

Each category of results are dierent. The Simulated Motions are completely computer generated. So that means the motions as well as the sensor readings are all computer generated. These experiments were preformed to test the image registration under varying simulated conditions.

The Computer Generated Results represents data captured from the simulator for motions of the cart. The sensor data for the INS and camera are then generated and noise is added to this data. The reason for using the cart trajectory data is because the results from this section can be directly compared to the third section, Real world results.

The Simulator Experiments use real data for the camera and INS. This data is compared to the actual location of the cart which is gathered from the simulator.

4.4

Simulated Motions

Over 100,000 test patterns were created to nd a relationship between error in the image registration and the error of the motion estimation. The test patterns are made up of a variety of dierent motions, from single parameter motions to motions involving the full 6 degrees of freedom. Just over 1% of the motions involved are made up of predetermined motions, while the remaining motions are generated from a normal distribution. Of these randomly generated motions, 25% of them involve only translations, i.e. a combination of x, y and z movements, and another 25% involve only rotations, i.e. a combination of pitch, roll and yaw. The remaining 50% of the randomly generated motions are made up of all six motions, i.e. x, y, z, pitch, roll and yaw. Of the translations, they are made up of 3 equally-distributed sample sets that include standard deviations of 2mm, 2.66mm and 4 mm. All of the angles are made up of standard deviations of 0.33 degrees. The reasoning behind using small motions was that the motions the image registration will

59

most likely come across will be small because the time laps between images are small. So this sample set represents the most likely scenarios in real applications.

The results of the simulated movements were processed in one of three ways.

Case 1

represents the image registration trying to solve for all 6 motions. Case 2 represents the three translations as well as the pitch angle. The reason for using pitch instead of roll and yaw was because it would have the most dissimilar motion with respect to the camera. In a pitch change the features would rotate around the center axis of the picture.

A

roll movement would mostly look like a Z movement with small changes in the distance the features are away from each other and a yaw movement would mostly look like a X movement but with small changes in the separation of the features. Case 3 is the three translations. This eliminates many potential problems of divergence, but requires a good estimate of the angles to work properly.

Each of the three cases were examined with four dierent scenarios using the same data. These four scenarios are dierent combinations of adding noise and motion prediction. The noise that was added had a normal distribution with a standard deviation of one third of a pixel. This represents a 99% condence that the noise will be within one pixel. Since the feature matching algorithm used has sub-pixel accuracy, this is a reasonable range for noise. When a motion prediction is used, a random amount of noise from a normal distribution is added to the actual motion. This noise has standard deviations of two thirds of a mm and 0.033 degrees for rotations. This scenario represents an excellent estimation and is the best that can be expected. When no motion prediction is used, the motion estimation is assumed 0, i.e. the object did not move or rotate from the last image. This scenario represents a poor estimation.

The rst set of results represents the best case scenario of an excellent motion prediction and no noise, see Table 4.1 . Under these conditions Case 1 performed awlessly, while Case 2 and Case 3 didn't perform very well. Figure 4.5 shows the relationship between

60

the image registration error and the motion error. Both Case 1 and Case 2 show a strong linear relationship between the process error and the actual error, i.e. as the process error increases so does the motion error. However Case 3 does not show a linear relationship like Case 1 and Case 2, instead the results are pooled in a large central area.

This is

most likely caused by errors in the rotations, since the results between Case 2 and Case 3, though dierent in gures, produced results in the table that are almost identical. Case

σx2 (mm)

σy2 (mm)

σz2 (mm)

2 (deg) σpitch

2 (deg) σroll

2 (deg) σyaw

1

5.2521e-20

8.4066e-20

2.3824e-20

1.0636e-23

1.7149e-23

5.2987e-25

2

5.1124

0.3359

5.3206

4.4471e-07

-

-

3

5.3253

0.3360

5.3609

-

-

-

Table 4.1: Image Registration Results with Motion Prediction and No Noise

Figure 4.5: Image Registration Results with Motion Prediction and No Noise (Top Left Case 1, Top Right Case 2, Bottom Center Case 3)

61

The next set of test results, see Table 4.2, represents no noise and no motion prediction. Here the results are very similar to the motion predictions. Once again Case 1 performs awlessly and Case 2 and Case 3 don't perform very well. Additionally, all three cases performed worse than when there was a prediction.

This is to be expected, since the

estimation that was sent to the image registration is close to the solution, this decreased the likelihood of a divergence.

Another interesting observation is that the results in

Figure 4.6 show a distinct separation between results in all three cases. This separation most likely indicates which motions were small and which were large, because the small motions will have closer estimates as all estimates were assumed zero. In addition, the results that performed very well in Case 2 and Case 3 are most likely the motions that had little or no rotations. The reason why this will not show up when there is an excellent prediction is because a good prediction can still have an error in the rotation, but a motion with no rotation and an estimate of 0 will be the perfect estimation for rotation. It should be noted that roughly 25% of motions that did not contain any rotation at all. Case

σx2 (mm)

σy2 (mm)

σz2 (mm)

2 (deg) σpitch

2 (deg) σroll

2 (deg) σyaw

1

1.4019e-17

1.4547e-16

3.7700e-17

2.7106e-21

2.9764e-20

1.4102e-22

2

10.2344

0.05797

10.1722

3.3285e-07

-

-

3

10.2152

0.05780

10.2427

-

-

-

Table 4.2: Image Registration Results with No Motion Prediction and No Noise

The results from Table 4.3 represent what can be expected when there is an excellent motion estimation. This is because varying amounts of noise were added as described above.

This noise can cause the image registration to diverge even when there is an

excellent estimation. The results from this test clearly show the benets of Case 2 and Case 3 over Case 1. When there is no noise, Case 1 performs awlessly, but when noise is added, it will adjust the rotations to nd a good t to this noise and as a result degrades the solution.

This noise, on the other hand, will help Case 2 and Case 3 because the

translations cause a distinct motion in the pixels that should not resemble noise. When

62

there is 6 degrees of freedom, the t can continue to improve, even though the results are degrading, i.e. there is more freedom to adjust the movement to t the data but this movement isn't necessary the correct motion.

Figure 4.6: Image Registration Results with No Motion Prediction and No Noise (Top Left Case 1, Top Right Case 2, Bottom Center Case 3)

σx2 (mm) σy2 (mm)

σz2 (mm)

2 σpitch (deg)

2 σroll (deg)

2 σyaw (deg)

1

11.3449

0.8301

11.8220

1.9147e-06

1.6652e-04

1.6597e-04

2

0.07170

0.08844

0.07479

9.3443e-07

-

-

3

0.07340

0.08689

0.07329

-

-

-

Case

Table 4.3: Image Registration Results with Motion Prediction

63

Figure 4.7: Image Registration Results with Motion Prediction (Top Left Case 1, Top Right Case 2, Bottom Center Case 3)

The nal set of results are displayed in Table 4.4 and Figure 4.8. This represents a bad case scenario where there is noise and no estimation. Here all three cases perform poorly. Case 1 seemed to perform the same as when there was a good estimation. This indicates that the problem is inherent to the noise and not the estimation, otherwise there would be a dierence between these results.

σx2 (mm) σy2 (mm)

σz2 (mm)

2 σpitch (deg)

2 σroll (deg)

2 σyaw (deg)

1

11.3387

0.8230

11.8278

1.8962e-06

1.6664e-04

1.6585e-04

2

5.1149

0.4189

5.3248

1.3814e-06

-

-

3

5.3262

0.4192

5.3617

-

-

-

Case

Table 4.4: Image Registration Results with No Prediction

64

Figure 4.8: Image Registration Results with No Prediction (Top Left Case 1, Top Right Case 2, Bottom Center Case 3)

4.5

Experiment Parameters

For all 3D test examples, the same test was used.

This test pattern starts with the

platform in a level position. The cart quickly accelerates to a constant velocity. Once the cart starts to move the platform rotates quickly towards the wall and then maintains it's position. The velocity of the cart is very slow; this allows testing the localization over long periods.

This test pattern evaluates the accuracy of the algorithm over dierent

speeds, i.e. the start of the test is much faster then the rest of the test. It is noted that fast motions, which were tested in the 2D tests, were not possible with 3D tests. The

65

reason for this was that the iris of the camera lens had to be closed to increase the depth of focus in absence of an auto focus camera. In the future, further experiments could be performed with an auto focus camera system.

For all of the 3D experiments dierent cases were used for the conguration of the image registration, see Section 4.4. It is noted that for the real world and computer generated results, only Case 1 and Case 3 are displayed in this thesis. Case 2 was performed, but didn't show any advantages over Case 1 and Case 3 and as such was removed from this thesis.

For each case there is a frame skip category. The frame skip number describes how often a camera image is processed, see Table 4.5. The more time that passes between images, the more likely the motion estimation will degrade, and the more likely the image registration will diverge. On the other hand, the greater the amount of time that passes, the larger the motion is and as such the greater the motion to noise ratio will be. For Case 1, Case 2 and Case 3, frame skips 1, 5, 10, 20 and 30 were performed and are summarized in Tables 4.6 and 4.7. Frames skips 1 and 20 were selected to be displayed in detail in the thesis. It is noted that frame skip 5 and 10 produced results that were in between frame skip 1 and 20. Also frame skip 30 failed to complete in all instances because of the large time lap.

This was most likely caused by the motion estimation to degrade to such a

point that the feature matching was not possible or that the image registration diverged. Frame Skip Number

1

5

10

20

30

Delay between Camera Images (s)

0.021

0.105

0.210

0.42

0.63

Table 4.5: Frame Skip Number to Time Conversion

66

4.6

Computer Generated Results

All of the computer generated results use the same motion data captured from the actual simulator. Only the sensors (camera and INS) data is generated. This means the actual location data is collected from the simulator and the camera, INS and EKF data are generated to represent expected sensor data given the actual location of the cart. This allows a direct comparison of the computer generated results to the real world results. The results are separated into two sections, Case 1 and Case 3. These cases are the same cases described in Section 4.4.

For each case frame skip 1 and 20 were examined, see

Section 4.5 for further details. It is noted that simulations were run for Cases 1,2 and 3 for frame skip 1,5,10,20 and 30. The summary of these tests are presented in Table 4.6. Frame Skip

Case 1

Case 2

Case 3

1

Pass

Pass

Pass

5

Pass

Pass

Pass

10

Pass

Pass

Pass

20

Pass

Pass

Pass

30

Pass

Pass

Pass

Table 4.6: Computer Generated Results Summary

Case 1 Case 1 represents all 6 degrees of freedom within the image registration algorithm. In Section 4.4, the results indicated that this case performed awlessly without noise, but quickly diverged when noise was introduced. The computer generated results support this as well. Figures 4.9, 4.11 and 4.13 show the displacement results with every frame being processed. These results demonstrate that the image registration produced a better result than the INS alone, but was very susceptible to the noise in the sensor data. Figures 4.10, 4.12 and 4.14 show the velocity results. These gures clearly show the eects of the noise over a small period of time. The velocity measurements are scattered and unusable.

67

The results for the orientations are displayed in Figures 4.15, 4.16 and 4.17. From these gures it is clear that the INS provides a far better representation of the true orientation than the camera.

Figure 4.9: Simulated X Distance, Case 1, Frame Skip 1

Figure 4.10: Simulated X Velocity, Case 1, Frame Skip 1

68

Figure 4.11: Simulated Y Distance, Case 1, Frame Skip 1

Figure 4.12: Simulated Y Velocity, Case 1, Frame Skip 1

Figure 4.13: Simulated Z Distance, Case 1, Frame Skip 1

69

Figure 4.14: Simulated Z Velocity, Case 1, Frame Skip 1

Figure 4.15: Simulated Pitch, Case 1, Frame Skip 1

Figure 4.16: Simulated Roll, Case 2, Frame Skip 10

70

Figure 4.17: Simulated Yaw, Case 1, Frame Skip 1

The performance of the image registration with six degrees of freedom is greatly increased when larger motions occur between images.

This is clearly illustrated with the frame

skip 20 results, see Figure 4.18, 4.19, 4.20, 4.21, 4.22, 4.23, 4.24, 4.25 and 4.26.

It's

also important to note that the results are a large improvement, even though the motion estimation being passed to the image registration will be a lot worse. This is because the algorithm is relying on the INS for 20 times longer than in the above results.

Figure 4.18: Simulated X Distance, Case 1, Frame Skip 20

71

Figure 4.19: Simulated X Velocity, Case 1, Frame Skip 20

Figure 4.20: Simulated Y Distance, Case 1, Frame Skip 20

Figure 4.21: Simulated Y Velocity, Case 1, Frame Skip 20

72

Figure 4.22: Simulated Z Distance, Case 1, Frame Skip 20

Figure 4.23: Simulated Z Velocity, Case 1, Frame Skip 20

Figure 4.24: Pitch, Case 1, Frame Skip 20

73

Figure 4.25: Simulated Roll, Case 1, Frame Skip 20

Figure 4.26: Simulated Yaw, Case 1, Frame Skip 20

Case 3 In the third case for the image registration, only the displacements are calculated. This means that the angles are assumed correct in the motion estimation. This case clearly shows that the noise introduced into the simulated camera data has a much smaller eect on the overall results. The velocity measurements in Figures 4.28, 4.30 and 4.32 aren't nearly as chaotic as Figures 4.10, 4.12 and 4.14.

74

Figure 4.27: Simulated X Distance, Case 3, Frame Skip 1

Figure 4.28: Simulated X Velocity, Case 3, Frame Skip 1

Figure 4.29: Simulated Y Distance, Case 3, Frame Skip 1

75

Figure 4.30: Simulated Y Velocity, Case 3, Frame Skip 1

Figure 4.31: Simulated Z Distance, Case 3, Frame Skip 1

Figure 4.32: Simulated Z Velocity, Case 3, Frame Skip 1

76

Figure 4.33: Simulated Pitch, Case 3, Frame Skip 1

Figure 4.34: Simulated Roll, Case 3, Frame Skip 10

Figure 4.35: Simulated Yaw, Case 3, Frame Skip 1

Similar improvements for Case 3 are also seen when comparing the frame skip 20 data to the frame skip 1 data.

This further supports the idea that the noise compromises

77

the results when using frame skip 1 with the current speeds. The Case 3 frame skip 20 produced the best results for all of the simulated tests. This supports the results seen in Section 4.4.

Figure 4.36: Simulated X Distance, Case 3, Frame Skip 20

Figure 4.37: Simulated X Velocity, Case 3, Frame Skip 20

78

Figure 4.38: Simulated Y Distance, Case 3, Frame Skip 20

Figure 4.39: Simulated Y Velocity, Case 3, Frame Skip 20

Figure 4.40: Simulated Z Distance, Case 3, Frame Skip 20

79

Figure 4.41: Simulated Z Velocity, Case 3, Frame Skip 20

Figure 4.42: Pitch, Case 3, Frame Skip 20

Figure 4.43: Simulated Roll, Case 3, Frame Skip 20

80

Figure 4.44: Simulated Yaw, Case 3, Frame Skip 20

4.7

Results from Real Experiments

2D Modeling Results Early work was performed on testing the feasibility of a motion tracking system with a low feature wall. The motion for the simulator was a 2D motion where the platform was locked in a at position and the cart moved parallel to the wall. During this test 4 blocks were used with a separation of 64 pixels. A low separation value was used because of the fast paced nature of this test. If a larger separation value was used, some of the blocks would not be in the target frame. This test was designed to see if the block matching algorithm was able to track objects under fast pace motion.

Three scenarios were tested: INS only, camera only, and nally INS and camera together. For the Camera results, the entire target frame was searched for the matching blocks. The best matching location was used to calculate the motion between the anchor frame and the target frame. For the camera + INS, the INS was used to predict the locations of the matching blocks in the target frame. Also because there was an estimated motion, the blocks were taken from an optimal location within the anchor frame. It is noted that

81

the actual location is data captured from simulator on the location of the cart carrying the sensors. This data is obtained from optical encoders on the servo motors and will have an error less then 1 mm when including belt ex.

Figures 4.45 and 4.46 clearly show that both the camera and the Camera + INS performed well.

These gures also show that the Camera + INS combination performed better

than just the camera. It is noted that the Camera + INS required only a fraction of the calculations then the camera alone because the whole frame doesn't have to be search and still returned a superior result. The reason for the superior results are clearly indicated in Figure 4.47. Frames 157, 168 and 172 show that the motions surpassed 216 pixels. This means that when the blocks were grabbed from the center of the anchor frame, some or the entire block was missing in the target frame. So even though the entire frame was searched, the ideal match was not possible and the next closest match was taken. This match essentially means nothing and the results show a decrease in the performance of the camera. It is also noted that the Camera + INS can detect motions of up to 432 pixels, Figure 4.47 shows that it was able to detect motions of close to 300 pixels.

Figure 4.45: Block Matching Results

82

Figure 4.46: Block Matching Results (Zoom)

Figure 4.47: Block Matching Results Pixel Movement

3D Model Results The results displayed in this section feature a single test processed in a number of ways. This allows dierent scenarios to be compared directly since the same data is used in each scenario.

The results are separated into 2 dierent categories: Case 1 and Case

3. These two cases represent dierent congurations of the image registration algorithm. For further details see Section 3.3. This test builds on what was learned in the 2D motion

83

tests and applies it to a 3D motion. It also relates to the simulated results seen in Section 4.6.

The results are categorized into the Actual, INS, EKF and Camera. The actual position data is obtained from the optical positioning sensors on the simulator and there total error should be less then 1 mm. This would include the ex of the belt and ex in the apparatus frame.

The INS uses only the data obtained from the INS, while the EKF

is a complete fusion of the INS and Camera data, and the camera is the data obtained from the camera with only a INS data being used for prediction. This allows the results of the camera to be compared to both the INS and the EKF. It should also be noted that the actual position is grabbed from the simulator and the results are not infallible. Only the encoders are used to track the motion, so if there is any belt ex or ex in the structure, as well as gear slack, these will not be reected in the actual position. The most important part of the results is that they follow the trend of the actual position. Frame Skip

Case 1

Case 2

Case 3

1

Pass

Pass

Pass

5

Pass

Pass

Pass

10

Pass

Pass

Pass

20

Fail

Pass

Pass

30

Fail

Fail

Fail

Table 4.7: Simulator Results Summary

Case 1 Case 1 with a skip frame 1 did not perform well, see Figures 4.48, 4.49, 4.50, 4.51, 4.52, 4.53, 4.54, 4.55 and 4.56. This was expected because both the simulated motions (see Section 4.4) as well as the computer generated results (see Section 4.6) predicted a poor performance. This is most likely caused by the small motions between camera images. These motions are so small that the noise from the camera is in the same order as the

84

motions. For instance, if the cart is traveling at a velocity of 6 mm/s in the X direction and the camera is approximately 300 mm from the wall, then with a frame skip 1, the motion seen in the X direction for the camera will be about 1.1 pixels. If it is assumed that the standard deviation for the feature detection is one third of a pixel, which is the value used for noise in the simulated results for Sections 4.4 and 4.6, then this will give a signal to noise ratio (SNR) of 3.67. The Rose criterion states that a SNR of at least 5 is needed to distinguish image features at 100% [52].

Additionally, the X directions

will experience the greatest motions with respect to the other parameters. This means that both directions Y and Z as well as pitch, roll and yaw will have much smaller SNR. Another good indication of a lot of noise is the velocity measurements, see Figures 4.49, 4.51 and 4.53. The velocity measurements change consistently from one extreme to another for the camera. This was also shown in the computer generated results, see Figures 4.10, 4.12 and 4.14. It should also be noted that in the rst 10 seconds of the experiment the results were good for the X, Y and Z displacements.

The decline in

performance after this is most likely caused by a buildup of error in the orientation. This error of orientation will degrade the performance of the INS because the removal of the force of gravity will be wrong and the INS will be indicating accelerations in the wrong direction. This means that the prediction of motion degrades leading to a greater chance of the divergence for the image registration algorithm.

Figure 4.48: X Distance, Case 1, Frame Skip 1

85

Figure 4.49: X Velocity, Case 1, Frame Skip 1

Figure 4.50: Y Distance, Case 1, Frame Skip 1

Figure 4.51: Y Velocity, Case 1, Frame Skip 1

86

Figure 4.52: Z Distance, Case 1, Frame Skip 1

Figure 4.53: Z Velocity, Case 1, Frame Skip 1

Figure 4.54: Pitch, Case 1, Frame Skip 1

87

Figure 4.55: Roll, Case 1, Frame Skip 1

Figure 4.56: Yaw, Case 1, Frame Skip 1

Case 1 with a frame skip 20 produced much better results than a frame skip 1, see Figures 4.57, 4.58, 4.59, 4.60, 4.61, 4.62, 4.63, 4.64 and 4.65. The X displacement, see Figure 4.57 performed exceptionally well. This is most likely due to the X displacement having the highest SNR. It is also noted that the velocity results, see Figures 4.58, 4.60 and 4.62, are greatly improved and the erratic behavior that was most likely caused by noise has dissipated. In addition, the orientation measurements coming from the image registration are usually worse than the INS measurements, see Figures 4.63, 4.64 and 4.65. This was to be expected as both simulated motions and computer generated results indicated poor orientation results. It is also noted that the image registration algorithm failed after 130 seconds. This is usually caused by an-inability to obtain a match on the set of features caused by poor estimation. As more and more frames are skipped, the estimation from

88

the INS will only degrade further and this is shown in the frame skip 20 when compared to smaller frame skips.

Figure 4.57: X Distance, Case 1, Frame Skip 20

Figure 4.58: X Velocity, Case 1, Frame Skip 20

Figure 4.59: Y Distance, Case 1, Frame Skip 20

89

Figure 4.60: Y Velocity, Case 1, Frame Skip 20

Figure 4.61: Z Distance, Case 1, Frame Skip 20

Figure 4.62: Z Velocity, Case 1, Frame Skip 20

90

Figure 4.63: Pitch, Case 1, Frame Skip 20

Figure 4.64: Roll, Case 1, Frame Skip 20

Figure 4.65: Yaw, Case 1, Frame Skip 20

91

Case 3 As described in Section 4.4, Case 3 limits the image registration to translations only. This means that the camera will only provide motion data in the

X , Y , and Z

directions.

When frame skip is 1, Case 3 performs better than Case 1, see Figures 4.66, 4.67, 4.68, 4.69, 4.70, and 4.71, but the results degrade over time. For the rst 20 seconds of the experiment, the camera performed exceptionally well. Even in the

Y

direction, see Figure

4.68, the results are excellent for the start of the test. When the platform was moving the camera is able to track the motion, but when the platform stops the SNR ratio greatly decreases and the results deteriorated.

Figure 4.66: X Distance, Case 3, Frame Skip 1

Figure 4.67: X Velocity, Case 3, Frame Skip 1

92

Figure 4.68: Y Distance, Case 3, Frame Skip 1

Figure 4.69: Y Velocity, Case 3, Frame Skip 1

Figure 4.70: Z Distance, Case 3, Frame Skip 1

93

Figure 4.71: Z Velocity, Case 3, Frame Skip 1

Case 3, frame skip 20 performs the best among all other cases, see Figures 4.72, 4.73, 4.74, 4.75, 4.76 and 4.77. Once again the results drift as time progresses, but the degree at which the drift occurs has been reduced with respect to Case 3 frame skip 1. Also the velocity measurements coming from the camera are excellent, see Figures 4.73, 4.75 and 4.77. Another observation is related to the Y displacement, see Figure 4.74. While the camera results show an increase in Y displacement with respect to the actual location, the results from the EKF show sharp spikes of increased Y displacement. The reverse also seems to appear. This must be caused by the INS and the most likely reason why the INS starts to show acceleration in this direction is an incorrect orientation. The force of gravity has not been accounted for correctly and the residual eects appear as a change in displacement. This means there is a possibility of correcting this error and increasing the accuracy with a better estimation of orientation. This can be a focus point for future work.

94

Figure 4.72: X Distance, Case 3, Frame Skip 20

Figure 4.73: X Velocity, Case 3, Frame Skip 20

Figure 4.74: Y Distance, Case 3, Frame Skip 20

95

Figure 4.75: Y Velocity, Case 3, Frame Skip 20

Figure 4.76: Z Distance, Case 3, Frame Skip 20

Figure 4.77: Z Velocity, Case 3, Frame Skip 20

Other observations worth noting are that the EKF usually produced worse results than

96

the camera.

One reason for this may be related to the covariance values used for the

camera in the EKF. A constant value was used all the time. Ideally this value should be aected by the results of the image registration and feature matching. Another approach would be to incorporate the image registration equations into the Kalman lter. Originally this was attempted with an EKF, but problems occurred, see Section 3.4. In the future a more advanced ltering technique can be used to overcome the diculties experienced with the EKF.

97

5 Conclusions and Future Work In this research, a localization system was developed and analyzed.

The localization

system involves a camera and an INS. The system was developed with a pipe inspection vehicle in mind, but can be applied to any environment where a wall is present.

This thesis focuses on the dierent components of the localization system, as well as a testing platform referred to as the simulator that was constructed for the purpose of testing the system. The localization system is comprised of a camera and an INS. The INS is used to gather data on the motions that occur between camera images. This data is then used to predict the motion that occurred between camera images. This allows the machine vision algorithm to minimize its search and increase the chances that a successful match is found.

The machine vision algorithm is divided into a number of components.

First, a block

matching algorithm is used to rene the motion estimation carried out by the camera. Next, an optical ow method based on Lucas-Kanade algorithm is used to rene the results from the block match. This will allow the blocks to change shape and account for the 3D motion that occurred between camera images. Once there are good matches for the blocks, feature points can be obtained. This method has one major advantage over the other processes. There is no need for distinctive features in the images to obtain feature matches. The feature points are then passed to the image registration algorithm. The image registration looks at the change of location of dierent feature points, and calculates the motion that will be required to account for the change in the features. The

98

process continues recursively and determines the motion for as long as the localization system is moving.

A simulator was constructed to test the performance of the localization system.

The

simulator moves a cart containing the INS and camera along a wall from one known position to another known position along a given path. This allows the results from the algorithm to be directly compared to the actual location of the cart.

A number of dierent parameters were used when testing the algorithms. Tests included dierent congurations of the image registration and dierent intervals between camera frames. The results captured from these tests closely match the results gathered from simulated data.

When the image registration algorithm was used to determine the translation and orientation of the motion, it was very sensitive to noise but returned results which were superior to the INS alone.

When the image registration was used to only detect the

translation, the system was less sensitive to noise and the performance was improved. Results also demonstrated that increasing the time between camera images, thereby increasing the motion between camera images which in eect increases the SNR. This made it easier to detect the motion and hence increased the accuracy of the algorithm.

Even though the algorithms showed promising results, there is need for further improvements. One of the major shortcomings was the EKF data fusion. In most circumstances, the EKF performed worse than the camera data. This is most likely caused by the poor estimation of the accuracy for the data obtained from the image registration in the EKF calculations for the update of the covariance matrix. As long as the feature detection and image registration error values were below preset values, the accuracy of the results was always considered the same. If more advanced ltering techniques were used to replace the EKF, then the image registration calculation could be incorporated directly into the

99

lter.

This will allow the feature locations to be the sensor input, which in turn will

allow for a more stable denition of expected variances.

The Lucas-Kanade algorithm can also be modied to decrease the number of calculations required. This can be done by experimenting with other versions including compositional, inverse compositional and inverse additive approaches. In addition, a pyramid version of the algorithm can be examined for this purpose. If successful, this will alleviate the need for performing the block matching.

If features were tracked over multiply frames, then the camera could be run at full speed, i.e.

frame skip 1.

This will provide the maximum SNR while also limiting the time

between updates from the camera. This would decrease the period of time the system is relying on the INS and would increase the accuracy of the motion predictions.

It

should be noted that this idea will greatly complicate the image registration and ltering algorithm.

Another location where future work can be concentrated on is the use of an auto focus camera. This would allow the wall to remain in focus over a much larger distance with respect to the wall.

This will also allow for faster motions because the iris can open

further, which allows more light into the camera. If this algorithm is to be used in a real application, then this is a major priority.

Given the above short comings and assuming an auto-focus camera doesn't have an ill eect on the performance of the system. The results for Case 3 frame skip 20 can be used to extrapolate the results for a long term test. Assuming that the results would be similar to the this test, and the accuracy required to locate a pit in one pipe length (6 m) in the X direction, then 400 m of pipe could be scanned before crossing this threshold is surpassed. This result isn't accurate enough for a useful product in long water mains. Ideally the localization system should achieve a range that is greater then 1000m. With that said, the velocity measurements are more then accurate enough for reporting the

100

speed of the vehicle. Another thing to keep in mind is that the 400 m result is only the output from the camera and not the EKF. If the data fusion was working better, then this result should have been much greater.

101

Bibliography [1] Dennis Krys and Homayoun Najjaran. an instructed environment. In

Ins assisted visual object localization in

CIRA - Computational Intelligence in Robotics and

Automation, 2007. [2] Dennis Krys and Homayoun Najjaran. Ins assisted vision-based localization in unstructured environments. In

SMC - Systems, Man and Cybernetics, 2008.

[3] Dennis Krys and Homayoun Najjaran. Ins-assisted monocular robot localization. In

ASME International Mechanical Engineering Congress & Exposition, 2010. [4] I. Cox.

Blanche:

robot vehicle. In

An experiment in guidance and navigation for an autonomous

IEEE Trans. on Robotics and Automation,

volume 7, pages 193

204, 1991.

[5] M. Carlowicz. Between iraq and a hard place.

Woods Hole Currents, 10(1), 2003.

[6] H. Najjaran. Robot torpedo to inspect water mains.

Canada Focus, 13(9):4, August

2005.

[7] H. Najjaran. Irc creating an underwater robotic vehicle to inspect in-service water mains.

Construction Inovation, 10(2), June 2005.

[8] Hafmynd,

Gavia

The

Great

Northern

Diver,

April

2009.

[online]

Avaiable:

www.gavia.is.

102

[9] Aboelmagd Noureldin Lorinda Semeniuk. Bridging gps outages using neural network estimates of ins position and velocity errors.

Measurement Science and Technology,

17, 2006.

[10] Jinling Wang, Matthew Garratt, Andrew Lambert, Jack Jianguo Wang, Songlai Hana, and David Sinclair.

Integration of gps/ins/vision sensors to navigate un-

manned aerial vehicles. In

ISPRS - The International Archives of the Photogram-

metry, Remote Sensing and Spatial Information Sciences., volume Vol. XXXVII. of B1, pages 963970, Beijing, July 2008. [11] Dariusz Lapucha. Precise gps/ins positioning for a highway inventory system. Master's thesis, Surveying Engineering, University of Calgary, 1990.

[12] N. Bulusu, J. Heidemann, and D. Estrin. Gps-less low-cost outdoor localization for very small devices.

Personal Communications, 7(5):2834, Oct 2000.

[13] M. Pachter. Ins aiding using optical ow - theory. In

ICNPAA 2004: Mathematical

Problems in Engineering and Aerospace Sciences, 2004. [14] Peter Lang, Miguel Ribo, and Axel Pinz. A new combination of vision-based and inerial tracking for fully mobile, wearable, and real-time operation. In

26th Workshop

of the AAPR/OAGM, 2002. [15] Sooyong Lee and Jae-Bok Song. Robust mobile robot localization using optical ow sensors and encoders.

In

International Conference on Robotics and Automation,

2004.

[16] John Lee G Sudhir. Video annotation by motion interpretation using optical ow streams. Technical report, Hong Kong University of Science and Technology, 1997.

[17] A. Giachetti, M. Campani, and V. Torre. The use of optical ow for road navigation.

Robotics and Automation, 14:3448, 1998. 103

[18] Espen Hagen and Eilert Heyerdahl. Navigation by optical ow. In

Computer Vision

and Applications, volume 1, pages 700704, The Hague, Netherlands, 1992. [19] Sebastrian Thurn, Wolfram Burgard, and Dieter Fox.

Probsbilistics Robotics.

The

MIT Press, 2005.

[20] Wolfram Burgard Frank Dellaert Dieter Fox, Sebastian Thrun. Particle lters for mobile robot localization.

Sequential Monte Carlo Methods in Practice, 2001.

[21] F.; Bergman N.; Forssell U.; Jansson J.; Karlsson R.; Nordlund P.-J.; Gustafsson, F.; Gunnarsson.

Particle lters for positioning, navigation, and tracking.

Signal

Processing, 50:425437, 2002. [22] Volkan Cevher and James H. McClellan. Fast initialization of particle lters using a modied metropolis-hastings algorithm: Mode-hungry approach. In

International

Conference on Acoustics, Speech, and Signal Processing, 2004. [23] Greg Welch and Gary Bishop.

An introduction to the kalman lter.

Technical

report, Department of Computer Science University of North Carolina at Chapel Hill Chapel Hill, NC 27599-3175, July 24,2006.

[24] J. Sasiadek and P. Hartana. Sensor data fusion using kalman lter.

ISIF, WeD5:19

25, 2000.

[25] Robert Brown and Patrick Hwang.

Kalman Filtering.

Introduction to Random Signals and Applied

John Wiley & Sons, 1992.

[26] Howie Choset, Kevin Lynch, Seth Hutchinson, George Kantor, Wolfram Burgard, Lydia Kavraki, and Sebastian Thrun.

Principles of Robot Motion.

The MIT Press,

2005.

[27] Naser El-Sheimy Aboelmagd Noureldin, Ahmed Osman. A neuro-wavelet method

104

for multi-sensor system integration for vehicular navigation.

Measurement Science

and Technology, 15, 2004. [28] Jonghyuk Kim and Galen Brambley. Dual optic-ow integrated navigation for smallscale ying robots. In

Australasian Conference on Robotics & Automation, 2007.

[29] P. Thambidurai M. Ezhilarasan. Simplied block matching algorithm for fast motion estimation in video compression.

Journal of Computer Science, 4:281289, 2008.

[30] David Lowe. Distinctive image features from scale-invariant keypoints.

International

Journal of Computer Vision, 60:91110, 2004. [31] Javier Ruiz-del-Solar Patricio Loncomilla.

Lecture Notes in Computer Science, chap-

ter Improving SIFT-Based Object Recognition for Robot Applications, pages 1084 1092. Springer Berlin / Heidelberg, 2005.

[32] Danica Kragic Mårten Björkman. Combination of foveal and peripheral vision for object recognition and pose estimation. In

IEEE International Conference on Robotics

and Automation, 2004. [33] S. Cheung A. Gyaourova, C. Kamath. Block matching for object tracking. 2003.

[34] Harvey Ho. 2d-3d block matching. Master's thesis, The University of Auckland.

[35] Mark S. Drew Steven L. Kilthau and Torsten Möller. Full search content independent block matching based on the fast fourier transform. In

[36] Shan Zhu and Kai-Kuang Ma. matching motion estimation. In

IEEE ICIP, 2002.

A new diamond search algorithm for fast block-

IEEE TRANSACTIONS ON IMAGE PROCESS-

ING, 2000. [37] Xuan Jing and Lap-Pui Chau. motion estimation. In

An ecient three-step search algorithm for block

IEEE Transaction on Multimedia, 2004. 105

[38] Christoph

Schnörr

Andrés

Bruhn,

Joachim

Weickert.

Lucas/kanade

horn/schunck: Combining local and global optic ow methods.

meets

International Jour-

nal of Computer Vision, 61:211231, 2004. [39] Simon Baker Iain Matthews.

Active appearance models revisited.

International

Journal of Computer Vision, 60:135164, 2004. [40] Aaron Hertzmann-Steven Seitz Li Zhang, Brian Curless.

Shape and motion un-

der varying illumination: Unifying structure from motion, photometric stereo, and multi-view stereo. In

IEEE International Conference on Computer Vision, 2003.

[41] Gary Bradski and Adrian Kaehler.

Learning OpenCV.

O'Reilly, 2008.

[42] Simon Baker and Iain Matthews. Lucas-kanade 20 years on: A unifying framework.

International Journal of Computer Vision, 56(3):221255, 2004. [43] S. Baker and I. Matthews. Equivalence and eciency of image alignment algorithms. In

Computer Vision and Pattern Recognition, 2001.

[44] Leopoldo Jetto and Sauro Longhi.

Development and experimental validation of

an adaptive extended kalman lter for the localization of mobile robots.

IEEE

Transactions on Robotics and Automation, 15:219229, 1999. [45] M. Vidyasagar Mark W. Spong, Seth Hutchinson.

Robot Modeling and Control.

Wiley, 2006.

[46] L. Jetto, S. Longhi, and D. Vitali. Localization of a wheeled mobile robot by sensor data fusion based on a fuzzy logic adaptive kalman lter.

Control Engineering

Practices, 7:763771, 1999. [47] Alberto Bemporad and Mauro Di Marco Alberto Tesi. Sonar-based wall-following control of mobile robots.

Journal of Dynamic Systems, Measurement, and Control,

122:226230, 2000.

106

[48] Huiyu Zhou Yaqin Tao, Huosheng Hu. for home-based rehabiliation.

In

Integration of vision and inertial sensors

IEEE International Conference on Robotics and

Automation, 2005. [49] Bruce Lucas and Takeo Kanade. An iterative image registration technique with an application to stereo vision. In

IJCAI - Proceedings of the 7th International Joint

Conference on Articial Intelligence,

pages 674679, Vancouver, British Columbia,

August 1981.

[50] M. Agrawal and K. Konolige. Real-time localization in outdoor environments using stereo vision and inexpensive gps. In

ICPR - International Conference on Pattern

Recognition, 2006. [51] MicroStrain

Inc,

3DM-GX1,

April

2009.

[online]

Available:

http://www.microstrain.com/3dm-gx1.aspx.

[52] Edwin M. Leidholdt Jr. Jerrold T. Bushberg, J. Anthony Seibert and John M. Boone.

The Essential Physics of Medical Imaging.

Lippincott Williams & Wilkins, 2001.

107