optimal feature design for vision-guided

0 downloads 0 Views 2MB Size Report
APk−1|k−1. A⊤ +Qk−1. (6.6) ..... Boston, MA: Artech House, 1993, pp. 625-655. [14] Z. Zhang and O. ... Systems, Man, and Cybernetics: Part. B, vol. 34, no. 2, pp.
OPTIMAL FEATURE DESIGN FOR VISION-GUIDED MANIPULATION By Azad Shademan Bachelor of Science in Electrical Engineering, 2001 University of Tehran, Tehran, Iran

A thesis presented to Ryerson University in Partial fulfilment of the requirement for the degree of Master of Applied Science in the program of Electrical and Computer Engineering.

Toronto, Ontario, Canada, 2005 c

Azad Shademan 2005

Author’s Declaration Page I hereby declare that I am the sole author of this thesis.

I authorize Ryerson University to lend this thesis or dissertation to other institutions or individuals for the purpose of scholarly research.

Signature:

I further authorize Ryerson University to reproduce this thesis or dissertation by photocopying or by other means, in total or in part, at the request of other institutions or individuals for the purpose of scholarly research.

Signature:

ii

Abstract

Optimal Feature Design for Vision-Guided Manipulation Azad Shademan, Master of Applied Science in Electrical and Computer Engineering Department, Ryerson University, 2005

Numerous industrial applications use vision-guided manipulation, where cameras are used to generate the feedback control signal. Current vision algorithms select a set of image features to estimate pose in real-time. Object design has received little attention in this context; more importantly, improper designs could lead to task failure. The focus of this thesis is on optimal industrial design of features. The goal is to construct the theory of optimal design for vision-guided manipulation. The problem is posed as a multi-objective optimization problem within an axiomatic-design theoretic framework. The visual and directional motion resolvability objectives are specified for a given 6-dimensional camera trajectory. Simulation results verify that the redesigned object satisfies the objectives. The practical implementation is attempted by camera calibration, pose estimation, and experiments on a real industrial object under a known camera trajectory.

iii

Acknowledgment

First and foremost, I would like to express my sincere gratitude to my supervisor, Professor Farrokh Janabi-Sharifi for his support throughout my graduate studies and also for providing the opportunity to work as a member of the Robotics and Manufacturing Automation Laboratory (RMAL) at Ryerson University. His meticulous supervision, neat ideas, scientific curiosity, and constant guidance has motivated me to aim for higher academic goals. I would also like to equally thank my beloved parents, Arsalan and Shahnaz, and my dear sister and her husband, Baharak and Afshin, whose unconditional tenacity, compassion, and encouragement are sources of divine inspiration for my life. Over the past two years, I have had the opportunity to be a roommate of a fellow MASc recipient and a reliable friend, Behdad, for whom I have a lot of respect. Together, we have challenged the harsh winter, enjoyed numerous discussions, done grocery shopping, and performed many other pleasant daily activities. Without his care, I would not have graduated. I would like to thank him for the memorable moments and his feedback about my work. Good luck in life, mate! Last but not least, I would like to thank the technical officer of the lab, Mr. Devin Ostrom, and the post doctoral fellow of the lab, Dr. Iraj Hassanzadeh, for their availability and engineering skills in maintaining and troubleshooting of the hardware setup.

iv

Table of Contents

Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

iii

Acknowledgement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

iv

Table of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

v

List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

ix

List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

xi

Nomenclature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

xii

1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1

1.1

Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2

1.1.1

Overview of robotic visual servoing . . . . . . . . . . . . . . . . . . . . .

2

1.1.2

Importance of pose estimation . . . . . . . . . . . . . . . . . . . . . . . .

3

1.1.3

Importance of feature selection and tracking . . . . . . . . . . . . . . . . .

6

1.2

Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

7

1.3

Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

9

1.4

Summary of Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

1.5

Thesis Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

v

2 Theory of Design for Vision-Guided Manipulation . . . . . . . . . . . . . . . . . .

13

2.1

Geometric Features versus Image Features . . . . . . . . . . . . . . . . . . . . . . 13

2.2

An Axiomatic-Design Theoretic Formulation . . . . . . . . . . . . . . . . . . . . 15 2.2.1

Principles of axiomatic design . . . . . . . . . . . . . . . . . . . . . . . . 15

2.2.2

Functional requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

2.2.3

Design parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

2.2.4

Design matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

2.3

Mathematical Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

2.4

Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

3 Feature Design for Optimum Visual Measures . . . . . . . . . . . . . . . . . . . .

24

3.1

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

3.2

Previous Work on Sensor Planning . . . . . . . . . . . . . . . . . . . . . . . . . . 25

3.3

Visual Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

3.4

3.5

3.3.1

Field-of-view . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

3.3.2

Focus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

3.3.3

Resolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

Design for Visual Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 3.4.1

Problem statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

3.4.2

Multi-objective optimization . . . . . . . . . . . . . . . . . . . . . . . . . 36

Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

4 Feature Design for Optimum Directional Motion Resolvability . . . . . . . . . . . 4.1

38

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 4.1.1

Image Jacobian and image singularity . . . . . . . . . . . . . . . . . . . . 39 vi

4.2

4.1.2

Manipulability ellipsoid . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

4.1.3

Direction dexterity measure . . . . . . . . . . . . . . . . . . . . . . . . . 46

4.1.4

Vision resolvability and motion perceptibility . . . . . . . . . . . . . . . . 46

Design for Directional Motion Resolvability . . . . . . . . . . . . . . . . . . . . . 47 4.2.1

Problem statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

4.2.2

Directional motion resolvability measure . . . . . . . . . . . . . . . . . . 48

4.2.3

Multi-objective optimization . . . . . . . . . . . . . . . . . . . . . . . . . 50

4.3

Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

4.4

Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

5 Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

55

5.1

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

5.2

Object Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 5.2.1

Objects with 3D features points . . . . . . . . . . . . . . . . . . . . . . . 56

5.2.2

Objects with circular holes . . . . . . . . . . . . . . . . . . . . . . . . . . 56

5.3

Relative Camera-to-Object Trajectory . . . . . . . . . . . . . . . . . . . . . . . . 60

5.4

Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

5.5

5.4.1

Graphical user interface . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

5.4.2

Evaluation of the field-of-view measure . . . . . . . . . . . . . . . . . . . 62

5.4.3

Design for directional motion resolvability . . . . . . . . . . . . . . . . . 63

5.4.4

Design for combined measures . . . . . . . . . . . . . . . . . . . . . . . . 67

Discussion of Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 5.5.1

Directional motion resolvability . . . . . . . . . . . . . . . . . . . . . . . 68

5.5.2

Combined visual measures and resolvability . . . . . . . . . . . . . . . . . 69 vii

5.6

Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

6 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

71

6.1

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

6.2

Hardware Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

6.3

Pose Estimation Using Kalman Filters . . . . . . . . . . . . . . . . . . . . . . . . 72 6.3.1

Pose Estimation Using Extended Kalman Filter . . . . . . . . . . . . . . . 74

6.3.2

Pose Estimation Using Iterated Extended Kalman Filter . . . . . . . . . . 76

6.3.3

Experimental Sensitivity Analysis . . . . . . . . . . . . . . . . . . . . . . 77

6.3.4

Tuning of Filter Parameters . . . . . . . . . . . . . . . . . . . . . . . . . 88

6.3.5

Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88

7 Conclusions and Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

90

7.1

Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90

7.2

Future Theoretical Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91

7.3

7.2.1

Scale-invariant feature points in design for vision-guided manipulation . . 91

7.2.2

Control stability measures in design for vision-guided manipulation . . . . 91

Future Experimental Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92

References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

93

A Camera Calibration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 A.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 A.2 Camera Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 A.3 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107

viii

List of Figures 1.1

A simplified block diagram of PBVS. . . . . . . . . . . . . . . . . . . . . . . . .

4

1.2

Sample industrial object undergoes redesign process. . . . . . . . . . . . . . . . .

9

3.1

Field-of-view cone. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

3.2

Simultaneous illustration of the field-of-view cone and the depth-of-field . . . . . . 31

4.1

Illustration of the feature point expression in camera frame, task frame, and the projection onto the sensor plane (image plane). . . . . . . . . . . . . . . . . . . . 41

4.2

Illustration of a camera on a known path Γ = {γγ1 ,γγ2 , . . . ,γγL }. . . . . . . . . . . . . 50

5.1

Object models (a) four coplanar points on a square (b) four coplanar points not on a square, and (c) eight spatial points that form a cube.

. . . . . . . . . . . . . . . 57

5.2

Single hole feature and its projection onto the image plane . . . . . . . . . . . . . 58

5.3

Multiple hole features and their projections onto the image plane . . . . . . . . . . 59

5.4

Automotive parts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

5.5

Circular pattern of hole features and their projections onto the image plane . . . . . 61

5.6

Multiple hole features and their projections onto the image plane . . . . . . . . . . 62

5.7

6-dimensional parametric camera trajectory and the parametric projected trajectory on the image plane with their directions . . . . . . . . . . . . . . . . . . . . . . . 63

ix

5.8

Snapshot of the developed GUI . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

5.9

Evaluation of the field-of-view measure . . . . . . . . . . . . . . . . . . . . . . . 65

5.10 Directional motion resolvability objective function versus the design parameter . . 67 5.11 Normalized objective functions vs. the design parameter and the min-max solution

70

6.1

Hardware Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

6.2

Optimal number of IEKF iterations. . . . . . . . . . . . . . . . . . . . . . . . . . 78

6.3

Sensitivity of EKF to different values of Qk−1 . . . . . . . . . . . . . . . . . . . . . 79

6.4

Sensitivity of IEKF to different values of Qk−1 . . . . . . . . . . . . . . . . . . . . 80

6.5

Sensitivity of EKF to different values of Rk . . . . . . . . . . . . . . . . . . . . . . 81

6.6

Sensitivity of IEKF to different values of Rk . . . . . . . . . . . . . . . . . . . . . . 82

6.7

Sensitivity of EKF and IEKF to different values of visual servoing speed. . . . . . 83

6.8

Error measure for EKF and IEKF with different values of visual servoing speed. . . 84

6.9

Comparison of position estimates for three sets of equal displacements in each coordinate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85

6.10 Sensitivity of EKF to different values of fC . . . . . . . . . . . . . . . . . . . . . . 86 6.11 Sensitivity of IEKF to different values of fC . . . . . . . . . . . . . . . . . . . . . . 87 A.1 Pinhole Camera Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 A.2 Error analysis of the camera calibration process . . . . . . . . . . . . . . . . . . . 109 A.3 Illusration of camera extrinsic parameters. . . . . . . . . . . . . . . . . . . . . . . 110

x

List of Tables 5.1

Evaluation of the field-of-view measure . . . . . . . . . . . . . . . . . . . . . . . 66

A.1 Different Notations for Camera Intrinsic Parameters . . . . . . . . . . . . . . . . . 108 A.2 Estimated calibration parameters. . . . . . . . . . . . . . . . . . . . . . . . . . . . 109

xi

Nomenclature FD PD MD Fi Pj Mi j χ χk

Vector of the functional requirements Vector of the design parameters Design matrix The ith functional requirement The jth design parameter Element of the design matrix Vector of design variables The kth element of the design variable X Feasible design subspace n R n-dimensional space χ Λ(χ ) Scalar objective function χ) The pth equality equation ε p (χ χ) The qth inequality equation ιq (χ χ) Vector of objective functions in multi-objective optimization problem Λ (χ χ) Weighted sum of objective functions H(χ wi Positive scalar, weights of the objective functions rC Position vector of the camera frame xC x-coordinate of camera frame yC y-coordinate of the camera frame zC z-coordinate of the camera frame νc camera viewing direction dC distance of the back nodal point to the image plane aC the entrance pupil diameter of the camera fC The focal length of the camera α 2 tan−1 (Imin /2dC ), the field-of-view angle of the camera Imin The minimum dimension of the active area of the sensor plane rK The position vector of the apex of the field-of-view cone rf The position vector of the center of a sphere circumscribing the object features Rf The radius of this sphere circumscribing the object features rA The position vector of feature points A rB The position vector of feature points B O Origin of the world frame The angle between νC and the vector that connects the βA feature point A to the apex of the field-of-view cone D1 The far depth-of-field plane perpendicular to the viewing direction D2 The near depth-of-field plane perpendicular to the viewing direction

xii

c δAB uδ ω

The limiting blur circle diameter that is considered acceptable The length of the line segment AB The unit vector along AB The smallest resolvable length on the sensor plane 3 P Projective plane 2 P Projective line Q Matrix of a quadric C Matrix of a conic P A point in P 3 , P = [X Y Z 1]⊤ Rh Radius of a hole feature Ph Center of a hole, Ph = [Xh Yh Zh ]⊤ M The camera matrix p A point in P 2 p = [x y 1]⊤ CT The top left 2 × 2 matrix of C H The transformation that transforms the canonical conic to the original conic CC The canonical matrix of a conic HR The rotation matrix rotating the canonical conic to the original conic HT The translation matrix translating the canonical conic to the original conic λk The eigen value of CT for k = 1, 2 ΛD Diagonal matrix of conic C Γ Camera trajectory, Γ = {γγ1 , . . . ,γγL } L The number of discrete points on Γ γk The pose of the camera at instant k k xC The x-coordinate of the camera position k yC The y-coordinate of the camera position zCk The z-coordinate of the camera position θxC The absolute angle of camera frame around x-axis θyC The absolute angle of camera frame around y-axis θzC The absolute angle of camera frame around z-axis ν fixed camera intrinsic parameters T χ ∈ R4N The vector of design variable (point in design space) N Number of features T Λ FOV ( χ ) The field-of-view measure for the design variable T χ Λ FOC (T χ ) The focus measure for the design variable T χ Λ RES (T χ ) The pixel resolution measure for the design variable T χ Λ DIR (T χ ) The directional motion resolvability measure for the design variable T χ

xiii

gFOV (T χ ) gFOC (T χ ) gRES (T χ ) gDIR (T χ ) {T } TP j TX j TY j TZ j {C} C Pk j CX j CY j CZ j C Vk C CΩ k C

{S} JkP, j JkO, j Φ Jkj (Φ) JkI (Φ) {B} BP e J q J† wm σi cJ f (b) D k·k v H(T χ) F(T χ )

The scalar objective function corresponding to the field-of-view measure The scalar objective function corresponding to the focus measure The scalar objective function corresponding to the pixel resolution measure The scalar objective function corresponding to the directional motion resolvability measure task frame Cartesian coordinate of jth feature point expressed in task frame x-coordinate of jth feature point expressed in task frame y-coordinate of jth feature point expressed in task frame z-coordinate of jth feature point expressed in task frame Camera frame Cartesian coordinate of jth feature point expressed in task frame x-coordinate of jth feature point expressed in camera frame y-coordinate of jth feature point expressed in camera frame z-coordinate of jth feature point expressed in camera frame Linear velocity of the camera frame at instant k expressed in the current camera frame Absolute angular velocity of the camera frame at instant k expressed in the current camera frame Image frame (also called sensor frame) Position part of the image Jacobian for feature j and camera frame at instant k Orientation part of the image Jacobian for feature j and camera frame at instant k Parameter set for the image Jacobian Φ = ( fC , sx , sy ,C X jk ,C Y jk ,C Z kj ) The image Jacobian for feature j and camera frame at instant k Total image Jacobian for all feature points Base frame Pose of the end-effector in the base frame Geometric Jacobian Vector that contains the joint variables in C-space (configuration space) The pseudo-inverse matrix of J manipulability measure The ith eigen value of JJ ⊤ (the ith singular value of J Condition number of J An n-dimensional vector function with scalar parameter b Camera path direction The L2 -norm Direction dexterity measure The weighted sum of objective functions −H(T χ ) xiv

Sp

Mext Mint R T sx sy Cx Cy Sx u Sy u Sx d Sy d Dx Dy κn Sx f Sy f Wr W˜ r Wd Wˆ r x xk xˆ k|k−1 xˆ k|k ek ek|k−1 zk G(xk ) ωk Qk νk Rk Pk|k Pk|k−1 Kk ρik

Projection of the feature point expressed in the sensor frame Extrinsic camera matrix Intrinsic camera matrix Rotation matrix that rotates the world frame onto the camera frame Translation that translates the origin of the world frame onto the origin of the camera frame Pixel spacing along the x-coordinate Pixel spacing along the y-coordinate x-coordinate of the piercing point of the camera y-coordinate of the piercing point of the camera Undistorted x-coordinate of the projection Undistorted y-coordinate of the projection Distorted (true) x-coordinate of the projection on the sensor plane Distorted (true) x-coordinate of the projection on the sensor plane Radial distortion compensation along x-axis Radial distortion compensation along y-axis Distortion coefficients, n = 1, 2, . . . x-coordinate of the feature point from the grabbed image y-coordinate of the feature point from the grabbed image Relative pose between the camera frame and the task frame The difference of the desired with the estimated relative pose Desired relative pose Estimated relative pose State variable (Kalman filter equations) State at step k a priori state estimate at step k a posteriori state estimate at step k given measurement zk a priori estimate errors a posteriori estimate errors Measurement vector Nonlinear measurement function Process noise Covariance matrix of the process noise measurement noise Covariance matrix of the measurement noise a priori error covariance a posteriori error covariance are defined as Kalman gain Measure of stability for pose estimates

xv

Chapter 1 Introduction

T

HE design of an object is an under-constrained problem, since there is flexibility in modifying the design within the feasible design space. Feasible design solutions that meet the

design specifications can be further optimized to satisfy a new constraint. In this thesis, the problem of design for vision-guided manipulation is analyzed and a theory is developed to address the optimal design of features for industrial objects. The under-constrained feasible design space is constrained by applicability in vision-guided manipulation. Since the new design goal is within the context of vision-guided control of robots, a solid background in this area is required to formulate the problem. An overview of robotic visual servoing is presented in Section 1.1.1. The importance of the pose estimation problem in computer vision is outlined in Section 1.1.2. The feature selection problem is also equally important in a visual servoing system. An overview of the feature selection and tracking is presented in Section 1.1.3. The rest of this chapter is as follows. The motivations are described in Section 1.2. The objectives of the thesis are described in Section 1.3. A summary of the contributions is provided in Section 1.4. Finally, the organization of the thesis is provided in Section 1.5.

1

1.1 Background 1.1.1 Overview of robotic visual servoing Robotic Visual Servoing (RVS) or vision-guided control of robots is the task in which the endeffector of a manipulator is aligned at a desired relative pose from the object of interest using real-time feedback of visual information using a single view or multiple views [1, 2]. High-speed RVS is an integral part of macro-level or micro-level robotic systems that use vision to increase the accuracy of manipulation and enhance robustness to uncertainties stemming from an unstructured workspace. Conventional robotic systems fail to operate correctly in the presence of uncertainties, because they are built to perform simple repetitive tasks. Different control laws result in different RVS architectures. In position-based visual servoing (PBVS) [3], the relative pose of the end-effector with respect to the object in the Cartesian space is estimated and then used by the controller. In image-based visual servoing (IBVS) [4], feedback signal is generated directly from the visual information in the image plane. There is no need to measure the exact spatial coordinates of the features in the 3D space, since a well-chosen feedback architecture compensates for the inaccuracies in camera calibration and pose estimation. Furthermore, when the feedback control law is defined in the image plane, the inaccuracies in the kinematic modeling of the manipulator is reduced. However, in many situations the PBVS performs better than IBVS [5]. This is mainly because the so-called well-chosen feedback controller for the IBVS is difficult to reach and the controller often experiences divergences due to image singularities. In contrast, with the PBVS the conventional controllers are easily used after determining the relative pose between the object frame and the manipulator end-point in Cartesian space. The focus of the present work is on the PBVS. Suppose that the relative pose between the

2

camera frame and the task frame is denoted by Wr = [Xr ,Yr , Zr , ψr , θr , φr ]⊤, where Tr = [Xr ,Yr , Zr ]⊤ shows the relative translation and [ψr , θr , φr ]⊤ shows the relative orientation (Euler angles, yawpitch-roll) between the frames. Such relative pose is found from solving the nonlinear projection equations of pinhole camera using three or more known 3D-2D point correspondences, from the object frame to the camera image plane: ( C X j , CY j , C Z j ) ↔ ( S x j , S y j ),

j = 1...N,

where N is the number of feature points and must be larger than three and is usually taken as five or six, so that the estimates of pose are reliable. In addition, the features should be noncoplanar for a flawless pose estimate [5]. The Cartesian controller in a PBVS uses W˜ r , which is the difference of a desired relative pose, ˜ r has Euler angles, but the Wd , with the estimated relative pose, Wˆ r . It is worth noting that W control law is based on the absolute angles and hence the Euler angles should be converted back to absolute angles accordingly [3, 6]. Fig. 1.1 illustrates a simplified PBVS structure. As it is seen, the pose estimation module utilizes the output of the image processing and the feature extraction method. It is often advantageous that the pose estimation method provide some prediction of the next estimate to perform the image processing in a smaller region of interest, in which the features are likely placed.

1.1.2 Importance of pose estimation The nonlinear problem of pose estimation is one of the most challenging problems in robot vision and has received extensive attention during the past two decades. To be more specific, control and positioning of an eye-in-hand robot, where a camera is mounted on the end point of a manipulator, requires accurate estimation of the rigid transformation and rotation of camera frame to object 3

Figure 1.1: A simplified block diagram of PBVS. The pose of the end-effector expressed in the task frame is represented by T We .

frame. A set of 3D points on the object is mapped to their projections on the 2D image plane to find the relative pose between the object and the camera. The equations that map these two sets are nonlinear in nature and hence make the problem more challenging specifically if real-time pose estimation is desired. One class of solutions to the problem of pose estimation, which uses iterative weighted least-squares techniques is described by Haralick et al. [7]. Lu et al. [8], have shown a fast and globally convergent pose estimation method that was motivated by the idea of Haralick et al. [7]. They have argued that the previous method had a slow local convergence and hence did not receive much attention, although it seemed to be globally convergent. A good, yet dated, survey of robust pose estimation methods along with their sensitivity analysis is given by Kumar and Hanson [9]. They mainly focus on least-squares and robust methods, and provide sensitivity analysis for

4

inaccurate estimates of camera intrinsic parameters. Another class of pose estimation methods, which is more to the interest of the visual servoing community, uses EKF to solve the nonlinear equations. An advantage of using Kalman Filter is that it predicts the next state variable, which can be further used by other modules such as feature extraction and image processing. EKF was first used by Wu et al. [10] to find the 3D location and then studied for the PBVS problem by Wilson et al. [3] where relative pose is estimated in a real-time fashion. The popularity of EKF mainly stems from the fact that many nonlinear systems have quasi-linear behavior. In addition, EKF is very convenient and fast for real-time processing and quite straightforward to implement if a priori information of the measurement and process noise covariance matrices is available. However, it is well known that EKF is not the optimal filter as opposed to Kalman Filter, due to the fact that linearization can generate instability when the assumption of local linearity is not met [11]. In addition, the convergence of the standard EKF is very dependent on the choice of the initial state estimate and tuning of filter parameters is crucial to the success of the estimates. Lefebvre et al. [12] have studied several modifications of Kalman Filters for nonlinear systems. They categorize all the different versions of Kalman Filters such as Central Difference Filter (CDF), Unscented Kalman Filter (UKF), and Divided Difference Filter (DD1) as Linear Regression Kalman Filters (LRKF) and compare them to EKF and IEKF [13, 14]. They mention that EKF and IEKF are generally better than LRKF, yet they require a careful tuning. An interesting result of their study is that IEKF outperforms EKF, because it uses the measurements to linearize the measurement function, where as in EKF and LRKF the measurement is not used [12]. The tuning of a standard Kalman Filter (with linear equations) is described in the largely cited introductory paper of Welch and Bishop [15]. Ficocelli and Janabi-Sharifi [16] have addressed the 5

issue of adaptive selection of measurement noise covariance matrix for EKF. However, the tuning of filter parameters in the case of IEKF has not received much attention. We will systematically approach the tuning problem in this thesis. Moreover, we address the speed issue and study how fast our visual servoing system can operate under the proposed IEKF-based pose estimation if a fast feature selection method is available. It is evident that IEKF requires more computations (due to its iterative nature) than EKF. This affects the system bandwidth and hence, the visual servoing maximum speed. It is important to note that the number of iterations that is required for the estimates to converge, does not contribute to the disadvantage of IEKF to EKF, since the total time of the computations is much less than the time required for feature selection and image processing. We have investigated the improvements that IEKF offers in Chapter 6.

1.1.3 Importance of feature selection and tracking High-speed vision systems are fundamental requirements for successful robotic visual servoing systems. Two factors contribute to the speed of the vision system: the hardware setup and the algorithm. With current high-speed cameras and frame grabbers, the speed-limiting factor of today’s systems is the algorithm. Image features are often selected once and tracked from frame to frame to reduce the computational burden. There is an extensive literature available on the selection of features for visual tracking. An important class of contributions in this field is on the cross-correlation-based trackers that are ideal for high-speed vision algorithms. The speed is usually enhanced by placing a window around an image feature-point and processing a small neighbourhood of the feature-point. Tomasi and Kanade [17] introduce a matching criterion for feature selection based on optimization of tracking performance with small image motion. Shi and Tomasi [18] improve the previous

6

work by computing the affine image motion using two measures: texturedness and dissimilarity. They show that selection of features maximizes the quality of tracking. The sum-of-squared-distances (SSD) tracker and its modified versions were introduced by Papanikolopoulos, Nelson, and Khosla [19, 20]. The SSD tracker is a gradient-based method and is accurate only if the feature window shows image gradients, and the intensity levels have sufficient disparity to provide accurate results in the presence of noise [21]. Feddema et al. [22] have developed a weighted feature selection method, which optimizes a weighting function of robustness, completeness, uniqueness, controllability, and sensitivity to visually control the relative pose between the end-effector of a robot and an object. They use small area-of-interest windows to increase the image processing speed. As can be anticipated, area-of-interest windowing techniques are important for high-speed feature tracking algorithms. The present work uses simple and fast binary image processing methods for feature selection and feature tracking similar to the work presented in [23] and [24]. The details of the image processing and feature tracking algorithms used in the present thesis has been provided in the form of a technical report [25] and is not reproduced.

1.2 Motivation As it was mentioned, vision-guided manipulation is a hot area of research with numerous applications in industry such as robotic assembly, robotic surgery, and space robotics. Conventional robots are suitable for simple repetitive tasks in highly structured environments, but vision-guided robots operate in less structured environments, while enhancing the accuracy and the robustness of an operation. The current focus of the research community is on the problem of object and image feature selection, where an optimal set of features that are already available on an industrial object 7

are selected to fulfill the visual servoing task [22, 26, 27]. In addition to the robust and automatic selection of image features, the design of objects could significantly improve the overall robustness and accuracy of the robotic tasks that incorporate vision as a source of feedback. This thesis is a natural continuation of the previous research on automatic feature selection for visual servoing carried out at the Robotics and Manufacturing Automation Laboratory (RMAL), Ryerson University [28, 29]. The focus of the present work is on the design of a set of feature points for robust vision-guided control of manipulators in semi-structured environments using a calibrated camera. The ultimate goal of this work is to develop the theory and implementation of optimal feature design for vision-guided manipulation and visual servoing. It can be anticipated that such a theory will have an interdisciplinary nature. We have based our presented theory on the axiomatic design theory. In order to construct a coherent new theory, the different areas of vision sensor planning, position-based and image-based visual servoing, and feature selection in robot vision, should be integrated into the axiomatic design framework. Typical application of our presented theory is in the design of objects that are used in both macro-level and micro-level vision-guided robotic tasks. The vision sensor is used to determine the relative pose of the end-effector of a robotic manipulator with respect to a desired object. Having the relative pose, the end-effector can be controlled to approach the task as desired. The main constraint in the current systems is the instability of the overall visual servoing process due to matrix singularities in the pose estimation calculations or image singularities. It is worth emphasizing that the feature points1 and their configuration on an object, are extremely important for the success of the visual tracking process. A set of features that are optimized to result in a better stability is desired to attain a robust system. This thesis attempts to construct a self-contained theory of de1 The

notion of features and feature points are explained in Section 2.1.

8

sign for vision-guided manipulation to improve the visual constraints and motion sensitivity of the end-effector. Under the new design, the system would be more robust with respect to path planning and inherent feature extraction noise. The new visual and motion constraints are formulated such that a newly designed object would suit vision-guided applications, in addition to maintaining the previous mechanical properties.

1.3 Objectives The aim of the work presented in this thesis is to construct a unified theory of feature design for vision-guided manipulation with applications to vision-guided motion planning of robots, modelbased visual servoing, and model-free visual servoing in both micro and macro domains. Figure 1.2 shows a sample industrial object with some hole features and a redesigned object with four of the features optimized for vision-guided manipulation. Redesigned features are shown with arrows. Features are defined in detail in Section 2.1

(a)

(b)

(c)

Figure 1.2: Sample industrial object undergoes redesign process. The holes are used to assemble other parts to this part. (a) Snapshot of the object, (b) original CAD layout without considering the design for vision-guided manipulation, and (c) some of the hole features shown with arrows are redesigned to suit vision-guided manipulation. The object and CAD model are courtesy of Devin Ostrom, Ryerson University. Reprinted and altered (shown on right) with permission.

9

1.4 Summary of Contributions The major contribution of this work is to develop the theory and implementation of design for vision-guided manipulation. Design for X is a well-known concept in science and engineering, where X can be any task such as assembly or manufacturability [30]. In this sense, X would be vision-guided manipulation in our theory. The presented theory is built upon the fundamentals of axiomatic design theory. Solutions to better designs for objects that lead to more accurate and more robust vision-guided manipulation and visual servoing can be achieved using the presented theory. A part of the design problem is stated as the inverse problem to the well-known vision sensor planning. In vision sensor planning, the generalized camera viewpoint is found to meet the visual task constraints. In our design paradigm, we define the new set of functional requirements and accordingly find an appropriate new set of design parameters, such that when a camera is moving along a known trajectory in space, the visual constraints are guaranteed to be met. Another contribution is a new directional motion resolvability measure to design optimal locations of object feature points. The proposed optimal design contributes to both position-based and image-based visual servoing, visual motion planning, and robotic vision-guided manipulation. A conference paper based on this chapter has already been presented and published [31]. In the experimental part, we contribute by implementing and analyzing a more robust pose estimation method in a position-based visual servoing framework, which is used for implementation of the experiments. In the newly presented method, Iterated Extended Kalman Filter (IEKF) is utilized to find more reliable pose estimates in a real-time fashion. The experimental results show that the IEKF-based pose estimation is superior to the traditional EKF-based pose estimation. A

10

conference paper based on this chapter has already been presented and published [32].

1.5 Thesis Organization A good research article provides the necessary background before getting into the details. It would be next to impossible to review all of the necessary background of an interdisciplinary research within a single section as different disciplines are integrated. Principles of different areas are reviewed in their respective chapters prior to explaining the details on the contributions. In this thesis the proposed theory and simulation results are presented first and the experimental setup, an improvement to the pose estimation process, and future work are presented next. In Chapter 2, the notion of features is clearly outlined to avoid confusions in the mechanical engineering and robotic vision communities. The problem is formally stated within the axiomatic design theory framework. The new set of functional requirements, the new design parameters, and the new design matrix are distinguished. The mathematics of multi-objective optimization that is utilized to solve our design problem will also be explained in this chapter. A major part of the presented theory deals with solving the inverse problem to vision sensor planning in a robotic cell as will be discussed in Chapter 3. Exposure to the direct problem is valuable on many different levels, hence, a brief survey of vision sensor planning in robotics is presented in Chapter 3 to further assist the reader in digesting the presented theory. We provide details on how we attempt to solve the feature design problem as the inverse problem to vision sensor planning. The visual constraints and their corresponding objective functions are also explained in this chapter. The optimal design is a multi-objective optimization problem in nature, with different visual constraints each acting as a separate objective function. The simulation results on test objects are also provided. 11

In Chapter 4, the new directional motion resolvability measure is introduced and the method to design feature point locations in order to optimize the measure is described. The simulation results on test objects are provided in Chapter 5. A Graphical User Interface that has been developed for simulation purpose is presented in this chapter. Evaluation of the measures introduced in the previous chapters and results on some of the test objects are also provided. The experimental setup is described in Chapter 6. Real-time pose estimation from features is an integral part of a typical visual servoing task. The well-known EKF-based pose estimation method is improved in this chapter by introducing the use of the Iterated EKF (IEKF) for pose estimation. The sensitivity of the IEKF-based pose estimation is also compared to the EKF-based pose estimation method. In Chapter 7, we conclude the thesis and provide future improvements in both the theory and the experiments. Appendix A is on the camera calibration, which is the first important step in building the experimental setup.

12

Chapter 2 Theory of Design for Vision-Guided Manipulation In this chapter, the general theory of the design for vision-guided manipulation is presented. This theory is a blend of the classic mechanical design theory and vision-guided robotics control, laying out the fundamentals of feature design for objects that will be used in the robotic visual servoing systems. An important concept that is used frequently in this thesis is the notion of “features”. We shall first describe different notions of features in different scientific communities to resolve ambiguities. The design process, the design parameters, and the functional requirements are described in detail. We also cover the multi-objective optimization principles to express the feature design problems as a sound mathematical problem.

2.1 Geometric Features versus Image Features The term feature is used in different communities to refer to different concepts. It is important to distinguish between the different concepts and clearly specify which one of these concepts is being 13

used in this work. A part’s geometry conveys a lot of information. In the mechanical design community, such information is characterized as geometric features. More specifically, geometric features have the following subsets [33]: form features, tolerance features, and assembly features. Form features refer to the part’s ideal geometry. Tolerance features refer to the variation that can be tolerated from the form features. Finally, assembly features denote the relationship between parts that are used in mechanical assembly. If a CAD system uses the above information in the design of a particular part, feature recognition can be performed automatically and with a high success rate. This is a very interesting subject and the motivation beyond feature-based CAD/CAM design research [33]. In computer vision and image processing, the term feature refers to an array of information that describes a larger set. For example, an image that has 512 rows and 512 columns of graylevel pixels (8 bits per pixel), contains 2,097,152 bytes of data or 2 MBs. Processing such a huge amount of information in real-time is not currently practical. Moreover, the individual pixels are very sensitive to variations in illumination and image noise. Some features can be found from the raw image data to obtain a robust set of features that describes the image well. A recent example of a robust feature set that is invariant to illumination, scale, and rotation, is the acclaimed ScaleInvariant Feature Transform (SIFT) features [34]. In the robotic vision community, geometric primitives such as lines, edges, and circles (holes) are usually used as features for tracking and pose estimation [22, 2, 26]. There are two advantages in using such simple features: availability on most industrial parts, and ease of automatic extraction. These features can be generalized to include the form features, which makes them attractive for both the mechanical design community and the robotic vision community. Although fast image processing techniques may be used in extracting such features, a very structured environment 14

should be used to ensure the validity of the feature extraction results. In this thesis, we have used the latter concept when we use the term feature. A feature that is represented in the task frame is called an object feature. The projection of that feature onto the image plane is called an image feature. A feature-point is the location where a feature is located. For example, if a hole on the part is taken as an object feature, the centroid of the hole can be taken as a feature-point. Since we mainly deal with feature-points in this thesis, object/image feature-point and object/image feature are used interchangeably.

2.2 An Axiomatic-Design Theoretic Formulation 2.2.1 Principles of axiomatic design One of the well-established paradigms in engineering is design. Principles of axiomatic design have been introduced by Nam P. Suh in his celebrated book [35]. With respect to this work, a good design is determined solely by two axioms: the Independence Axiom and the Information Axiom. Before getting into the details of the axioms, the basic concepts of the functional requirement (FR) and the design parameter (DP) should be introduced. The goal of any product design process is to meet the specific needs of that product. Such needs are specified by FRs in the functional domain. By definition, one FR should be independent of another FR, otherwise they are both different interpretation of the same FR. In other words, the minimum set of needs that completely meets the design objectives is referred to as FRs. The set of statements that characterize the physical behaviour in the physical domain is referred to as DPs. The design process involves a mapping that maps the DPs to the FRs through a design matrix. Mathematically speaking, the design process is expressed in the form of the design equation as 15

follows: FD = MD PD ,

(2.1)

where FD = [F1 . . . Fm ]⊤ is the vector containing the FRs, PD = [P1 . . . Pn ]⊤ is the vector containing DPs, and M D is the m × n design matrix. The elements of the design matrix, Mi j , can generally be expressed as Mi j =

∂Fi . ∂Pj

(2.2)

As mentioned previously, there are two design axioms. The independence of the FRs is guaranteed by the Independence Axiom. Any proper design would consider the independence of FRs. A design in which two FRs are not independent is called a coupled design. A design that meets the Independence Axiom is called an uncoupled design. A coupled design is not desirable. Examples of coupled designs can be found in [35]. It is noteworthy to point that a coupled design may be decoupled, if the coupling is caused by the insufficient number of DPs that results in the dependence of FRs. In this case, decoupling a design involves the process of introducing more DPs that result in independent FRs. The Information Axiom on the other hand, provides minimum complexity of design by minimization of the information content of the design. Many propositions may be derived based on these two axioms in the form of corollaries. The proposed corollaries [35] are decouplings of coupled designs, minimization of FRs, integration of physical parts, use of standardization, use of symmetry, largest tolerance, and uncoupled design with less information. These corollaries assist a designer in specific design decisions. More importantly, many theorems may be derived from the above axioms. As we have built our design scheme upon the available design theory, we shall use many of the available design theorems later in the proceeding sections and chapters. We shall describe these theorems in more 16

detail here and provide clues on how these theorems will be related to our problem. The first issue is the determination of the ideal number of the FRs and DPs. Given that we are interested in designing objects that are currently used in industry, an original design solution is available to us. Without loss of generality, we shall assume that the available design solution is ideal. Two theorems deal with the situation, where the number of DPs is less than the number of FRs. In this case, the design matrix has more rows than columns, which either a coupled design is resulted or some of the FRs cannot be satisfied by the DPs. Furthermore, if a subset of the design matrix is triangular, a coupled design may be decoupled by introducing enough DPs, such that the number of the DPs and the FRs are equal. These results form the theorem of coupling due to insufficient number of DPs and the theorem of decoupling of coupled design. If the number of DPs is larger than the number of FRs, some of the DPs do not actually affect the design and therefore the design is either coupled or redundant. This is called the theorem of redundant design. The theorem of ideal design, on the other hand, states that in an ideal design the number of DPs is equal to the number of FRs. So far, we have covered the first four basic design theorems. We can conclude that any ideal design solution has equal number of FRs and DPs. We shall introduce a new set of FRs such that the object will be suitable for applications that involve vision-guided control of manipulators. In the design terminology, our problem is to find a new set of DPs in the physical domain that provide an uncoupled design for the new set of FRs. We shall now move on to explain how we can find a new set of DPs to obtain an optimum design. It would be a common mistake not to redesign the object, but just to add additional DPs [35]. The theorem of need for new design states that with introducing new FRs, the design solution 17

given by the original DPs cannot satisfy the new set of FRs and a new design solution must be sought. This is very important. One of the mistakes that a designer may make in designing objects that are suitable for vision-guided manipulation, is to add easily recognizable markers. According to the mentioned theorem this would not necessarily lead to the optimum design. Explanation of all of the design theorems discussed in [35], is beyond the scope of this thesis. We have only introduced the most important theorems that are related to our design paradigm. The rest of the theorems mainly deal with the Information Axiom and practical implications of the Independence Axiom.

2.2.2 Functional requirements An ideal design has an equal number of functional requirements and design parameters. The available ideal design is meant to be redesigned to meet the requirements of applicability in visionguided manipulation. The fact that an ideal design is available suggests that the number of functional requirements and the number of the design parameters are equal. Adding new functional requirements implies that new design parameters must be introduced and the theory of need for new design infers that instead of trying to meet the new functional requirements by the new design parameters, a new design must be sought. Failing to do so, may result in a coupled design in which the new design parameters affect the previous functional requirements. For example, if an easily-recognizable marker is added to a sensitive medical object, some of the functional requirements may be affected in an undesired manner. Moreover, in some cases, such as design for micro-assembly, adding a marker is not possible, since the assembled parts are fabricated using sophisticated silicon processing techniques. One assumption that we have made is that the old functional requirement are met by optimizing

18

a set of design parameters [36]. With this assumption and without loss of generality, the old design parameters are chosen from the feasible domain of the model. With the introduction of the new design parameters, new values for the old design parameters would be obtained after optimizing an objective function. The functional requirements Fi for redesigning an object for vision-guided manipulation has been initially driven as follows: Fk = Maintain the original mechanical characteristics of the design, while redesigning the object to be used in conjunction with the vision module. The number of old functional requirements is supposed to be n; therefore, k = 1 . . . n. Fn+1 = Improve the features for better visual quality (field-of-view, focus, and resolution) Fn+2 = Improve the features for better camera-to-object relative-motion resolvability without altering the specifications of the industrial object. Fn+3 = Improve the features for faster computational speed. A closer look at the above FRs reveals that some of them are affected by the same DPs from the physical domain; therefore, they must have the same nature and should be combined together. For example, the design of a feature point to be in focus affects the motion resolvability of an object, since both measures are affected by the same design variable, i.e., the position of feature points. Moreover, all of the FRs must be stated independently, such that a DP affecting a specific FR does not alter another FR. For example, the resolution of a hole feature is specified by its diameter and not the location of its center. The functional requirements must be rewritten as follows to meet the Independence Axiom: Fk = Maintain the original mechanical characteristics of the design, while redesigning the object 19

to be used in conjunction with the vision module. The number of old functional requirements is assumed to be n; therefore, k = 1 . . . n. Fn+1 = Improve the features for better field-of-view, focus, and camera-to-object relative-motion resolvability without altering the specifications of the industrial object. Fn+2 = Improve the features for better pixel resolution and faster feature extraction (faster computational speed).

2.2.3 Design parameters The design parameters of the old design are used in conjunction with the new design parameters in an optimization formalism to attain the optimum design. With the FRs introduced in the previous section, the design parameters are described as follows: Pk = Design variables used to satisfy the old n functional requirements, k = 1 . . . n. Pn+1 = Configuration of feature points and their 3D locations in the object frame. Pn+2 = Size of feature points (directly affecting Resolution) and the contrast between the features and the background for fast image processing and feature tracking techniques (see Section 1.1.3).

2.2.4 Design matrix The design equation in (2.1) suggests that a square design matrix is obtained with FD = [F1 . . . Fn+2 ]⊤ and PD = [P1 . . . Pn+2 ]⊤ . The elements of the design matrix from (2.2) are non-zero for i = j and zero for i 6= j forming a diagonal design matrix.

20

2.3 Mathematical Optimization A single-objective optimization problem can be formulated as seeking a vector χ = [χ1 χ2 . . . χn ]⊤ belonging to a subset X of the n-dimensional space R

n

to minimize a single objective function

Λ(·). This is often shown as χ) min Λ(χ

(2.3)

χ

subject to χ) = 0, p = 1 . . .nε ε p (χ χ) ≤ 0, q = 1 . . . nι ιq (χ χ ∈ X ⊆R

(2.4)

n

χ) is an equality constraint and ιq (χ χ) is an inequality constraint on vector χ . This formuwhere ε p (χ lation is also called constrained optimization, since the search space of the vector is constrained to equality and inequality relations stated above. There are numerous methods to solve such optimization problem, such as the steepest descent, Newton’s, and conjugate directions methods [37]. Depending on the type of the objective function, a suitable method can be utilized to find the solution. The principles of optimal design is based upon mathematical optimization [36]. Any design vector χ that satisfies the equality and inequality relations is a feasible design. The optimal design corresponds to the design vector that minimizes the objective function. Most real engineering problems extend beyond the simple case of single-objective optimization. The rest of this section briefly reviews the basics of multi-objective optimization.

21

A multi-objective optimization problem is posed as χ) = [Λ1 (χ χ) Λ2 (χ χ) . . . ΛL (χ χ)]⊤ min Λ (χ χ

(2.5)

subject to the same constraints brought in (2.4). The number of the objective functions is equal to L. In contrast to single-objective optimization, the vector of objectives must be traded off in some way in multi-objective optimization. Optimality cannot be defined for a vector, but rather must be conceptually agreed upon for the vector of objectives and in this sense optimality is more of a concept than a definition [38]. Pareto optimality (also called noninferiority) is one of the most predominant concepts in multi-objective optimization. A Pareto optimal solution is one where increasing an objective function leads to decreasing at least one of the other objective functions [39]. There are many approaches to the multi-objective optimization problem. Review of all of the literature in this field is beyond the scope of this thesis. For a survey of the different methods see [38]. The most common method for multi-objective optimization is the weighted sum method, in which the weighted sum of objectives forms a single objective. That is L

χ) = ∑ wi Λi (χ χ), H(χ

(2.6)

i=1

where wi is a positive scalar. Minimizing (2.6) is sufficient but not necessary for Pareto optimality [38]. Throughout this thesis, we shall use the weighted sum method whenever the problem is posed as a multi-objective optimization problem.

22

2.4 Summary The general theory of design for vision-guided manipulation was introduced. An axiomatic-design theoretic approach was adopted and the design axioms and theorems have been discussed. The functional requirements of the design in the functional domain are specified such that the Independence Axiom is not violated. The design parameters in the physical domain are accordingly specified and it was shown that an uncoupled ideal design can be achieved. The multi-objective optimization framework, within which the optimal design could be found, was also briefly reviewed in this chapter.

23

Chapter 3 Feature Design for Optimum Visual Measures 3.1 Introduction Features on an object can be designed such that the visual measures are improved. Imagine a camera with fixed viewpoint and constant optical settings such as focal length and aperture. It is desired that the image of an object lies within the field-of-view of the camera. In vision sensor planning, it is assumed that the object to be viewed is constant and the camera pose and optical settings can be changed to view the object. In this sense, a generalized viewpoint is defined to include the camera position, viewing direction, and optical settings [40]. Other visual measures such as focus and resolution can be improved by planning the optical settings of the camera. The inverse problem to the vision sensor planning is an interesting and challenging design problem. The design of object features is desired such that when a camera moves along a known path, the features satisfy the visual measures. It would be fairly easy to design the object for one given generalized viewpoint. If design for vision-guided manipulation is considered, one cannot assume a constant viewpoint. Many experimental setups do not allow real-time adjustments of the

24

camera optical settings. The challenge of design is in fixing of the optical settings while the camera path is known. For example, if the camera moves close to features, the focus constraint might be violated. The functional requirements of an optimal design have been discussed in Chapter 2. It is worth noting that the feature resolution constraint would form one of the functional requirements, while a combination of focus and field-of-view measures would form another functional requirement. The latter two measures can be applied to feature points, while resolution depends on the feature type and its characteristics. For example, if hole features are used, the diameter of the hole must be resolved on the image sensor. As focus and field-of-view are defined on feature points, they would be integrated to camera-to-object relative-motion resolvability constraint (see Chapter 4) to form the functional requirement discussed in subsection 2.2.2.

3.2 Previous Work on Sensor Planning Sensors are the input interface between workspace and processing units. Either a laser range finder or a camera, a sensor must be planned to provide useful data from the scene objects in any application that incorporates machine vision. The process that determines the parameters of a sensor to improve the performance of a sensor-based task is regarded to as sensor planning. View planning, for example, is an integral part of vision sensor planning. Many sensor-based applications require a sensor to be planned prior to obtaining reliable information. Object inspection and scene reconstruction are two of the most important applications, in which the output of a sensor should be accurately measured. Other applications of sensor planning in computer vision include sensor-based robot path planning [41, 42], environment modeling [43, 44, 45, 46, 47], autonomous exploration [48], and 3D object search [49]. 25

Our aim is to represent the problem of design for vision-guided manipulation as an inverse problem to vision sensor planning. Sensor planning is a mature field and we shall not repeat the extensive surveys that are available in the literature here, but we shall provide enough detail to make the inverse problem paradigm easy to follow. An extensive survey of sensor planning in computer vision is presented by Tarabanis et al. [50]. They focus mainly on the inspection application and other recent applications such as object/scene construction is left unaddressed. Another recent survey on sensor planning, with a focus on scene construction, has been recently published by Scott et al. [51]. They do not limit their sensor type to cameras, but also survey the research efforts that use laser range finders. Their formulation of the problem is nicely crafted. The enthusiastic reader is encouraged to examine this paper in order to gain further insight into sensor planning. The position and orientation of the camera is called the camera viewpoint. When an object must be recognized, the first step is to adjust the camera viewpoint such that the object is within the field of the view of the camera. Adding the adjustable set of camera optical settings, such as the focal length and aperture, to the camera viewpoint is called a generalized viewpoint. In order to clearly specify the inverse problem of sensor planning, we can adopt the Machine Vision Planner (MVP) scheme introduced by Tarabanis et al. [40]. The sensor planning task of the MVP is to find the optimum generalized viewpoint of a camera with respect to a goal function that is defined based on a set of four measures, namely, field-of-view, focus, resolution, and visibility. Their resolution constraint is based on a more accurate model than the one that is used by the early work of Cowan and Kovesi [52], while the rest of the measures are basically the same. In addition to completing the constraint of Cowan and Kovesi, Tarabanis and Tsai have geometrically analyzed the optical measures [53]. The important contribution of the MVP system over the sensor planning of Cowan and Kovesi is in that it redefines the problem in an optimization formalism, where the 26

optimum generalized viewpoint is found with respect to a goal function. The choice of the cost function is very important and can alter the optimum solution significantly. On the other hand, if the generalized viewpoint of the camera is fixed, the pose of an object may be changed to attain a better visual image. More interestingly, if an object can be redesigned, a design to optimize a cost function based on the visual measures is desired. In this sense, the problem of the design for vision-guided manipulation is the inverse problem of sensor planning. We shall also adopt an optimization formalism to find the optimum design. If the camera trajectory is available and the goal is to find the optimum object design for all camera viewpoints on the trajectory, we would have a multi-objective optimization problem at hand.

3.3 Visual Measures The geometric parameters of the camera include the position of the camera frame, rC = [xC yC zC ]⊤ and its viewing direction νC found from the frame orientation angles, [θxC θyC θzC ]⊤ . The optical parameters of the camera are the distance of the back nodal point to the image plane dC , the focal length fC , and the entrance pupil diameter aC of the lens.

3.3.1 Field-of-view The field of view of a camera is limited by the minimum dimension of the sensor plane. The fieldof-view angle α is given by α = 2 tan−1 (Imin /2dC ) , where Imin is the minimum dimension (width or height) of the active area of the sensor plane, and dC is the distance from the back nodal point of the lens to the image plane. The locus of the generalized viewpoints satisfying the field-of-view is found to be [53], α (rK − rC ).νC − ||rK − rC || cos( ) ≥ 0, 2 27

(3.1)

where rC is the position vector of the camera frame (front nodal point), νC is the unit vector along the optical axis in the viewing direction, and rK is the position vector of the apex of the field-ofview cone given by the relationship rK = r f − R0 νC , where R0 = R f /(sin α/2), r f is the position vector of the center of a sphere circumscribing the object features, and R f is the radius of this sphere [40]. Since we are interested in finding the feature points that are within the field of view, we should rewrite the constraint as described below. Suppose that rA and rB denote the position vectors of feature points A and B measured from the origin of the world frame O. For a feature point to lie inside the field of view cone, the angle of the vector that connects the feature point to the camera nodal point, must generate an angle smaller than half of the field-of-view angle α. Fig. 3.1 illustrates the field-of-view cone with feature point A inside and feature point B outside the cone. The cone can be considered around the viewing direction νC . The angle between the vector that connects the feature point to the front nodal point of the camera can be found from the following equation: cos(βA ) =

(rC − rA ).νC , ||rC − rA ||

(3.2)

where βA is the angle between viewing direction νC and the vector that connects the feature point A to the apex of the cone. Similarly, βA and rA can be replaced with βB and rB , respectively in (3.2) to find the angle associated with feature point B. For a feature point to lie within the field-of-view cone, βA ≤ α/2 or cos(βA ) ≥ cos(α/2), since the angles are smaller than π/2. Therefore, the field-of-view constraint for the feature design problem is determined from the following inequality equation: α (rC − rA ).νC − ||rC − rA || cos( ) ≥ 0. 2 28

(3.3)

Figure 3.1: Feature point A is inside and feature point B is outside the field-of-view cone: βA < α/2 and βB > α/2.

One should note that the field-of-view measure can be considered as an objective function and not a mere boolean expression. A feature point, which is close to the surface of the field-of-view cone, is more probable to fall outside of the cone with a slight motion of the camera. Such a feature point produces a small value for the field-of-view measure. On the contrary, a feature point that lies on the viewing direction generates the largest value for the field-of-view measure.

3.3.2 Focus Only one plane will have exact focus for a pinhole camera model. However, the concept of depthof-field range can be used to find points that are not blurred too much and can be reliably used in vision algorithms. The feature points must be designed such that when the camera moves along a desired path, all of the feature points would still be within the region that satisfies the depth-of-

29

field constraint. This region is the band between the far and near planes of which the image of the feature is not blurred. The far and near planes are perpendicular to the viewing direction and at distances D1 and D2 [52]: aC fC dC aC (dC − fC ) − c fC aC fC dC D2 = , aC (dC − fC ) + c fC D1 =

(3.4) (3.5)

where c is the limiting blur circle diameter that is considered acceptable, aC is the diameter of the entrance pupil of the lens system, dC is the distance from the back nodal point of the lens to the image plane, and fC is the intrinsic focal length of the lens [40]. For an arbitrary feature point A, the depth-of-field constraint can be written in vector form as follows [52, 53]: D2 ≤ (rA − rC ) · νC ≤ D1 ,

(3.6)

where rC is the position vector of the camera frame, rA is the position vector of the feature point in the world frame, and νC is the unit vector along the optical axis in the viewing direction. Fig. 3.2(a) shows the acceptable depth-of-field range within the field-of-view cone. Fig. 3.2(b) illustrates the concurrent acceptable region (shown in dark gray) for feature point locations when a camera is moving along a desired path. It should be noted that the points within this region do not possess equal weights. The optimum feature point locations must be derived using multi-objective optimization techniques, as it will be discussed later in Section 3.4.

3.3.3 Resolution The pixel resolution constraint is the last visual constraint that will be considered. It indicates the smallest size of an object feature that can be resolved on the image sensor. 30

(a)

(b)

Figure 3.2: Simultaneous illustration of the field-of-view and the depth-of-field measures (a) one camera location is shown. The acceptable region is emphasized with gray. The far and near planes D1 and D2 , respectively, are perpendicular to the viewing direction νC , and (b) The concurrent acceptable region is shown with dark gray for four different camera positions and viewing directions. This region is formed from the intersection of the acceptable regions for each camera.

The vectorized inequality that determines the locus of camera poses and optical settings for which line-segment features have at least a length ω has been found in [53]: ||[(rA − rC ) × uδ ] × νC || ω − ≥ 0, [(rA − rC ) · νC ][(rB − rC ) · νC ] dC δAB

(3.7)

where rA and rB are the position vectors of the end points of the line segment, rC is the position vector of the camera frame in the world frame, dC is the distance from the back nodal point of the lens to the image plane, νC is the unit vector along the optical axis in the viewing direction, uδ is the unit vector along AB, δAB is the length of the line segment AB, fC is the intrinsic focal length of the lens, that is, the focal length of the lens for an object at infinity. For hole features, the above relation cannot be used because the different diameters of the hole are mapped onto different line segments on the sensor plane. In fact, the image of a circle in the 3D Cartesian space would map to a 2D ellipse on the sensor plane. To ensure pixel resolution for a hole, one would only need to guarantee that its projected ellipse is resolvable; in other words, the 31

smallest diameter of the ellipse has a length of ω or greater. A rather straightforward method to find the projected ellipses is also investigated by applying linear algebra and projective geometry. The concept of conics and quadrics is used to find the diameters of the projected ellipse. Quadrics in P 3 are analogous to conics in P 2 . A 4 × 4 matrix Q defines a quadric if Q is symmetrical. Assume that  q11 q12 q13 q21 q22 q23 Q= q31 q32 q33 q41 q42 q43

 q14 q24  . q34  q44

(3.8)

Matrix Q defines a quadric if q12 = q21 , q13 = q31 , q14 = q41 , q23 = q32 , q24 = q42 , and q34 = q43 . The quadric equation is then written as P⊤ QP = 0,

(3.9)

where P = [X Y Z 1]⊤ is a point in P 3 . We can also write the quadric equation in the familiar form of aX 2 + bY 2 + cZ 2 + dXY + eY Z + f X Z + gX + hY + iZ + j = 0. In this case the quadric matrix is   a d/2 f /2 g/2 d/2 b e/2 h/2 . Q= (3.10)  f /2 e/2 c i/2  g/2 h/2 i/2 j Consider the hole feature to have a radius Rh and centered at Ph = [Xh Yh Zh ]⊤. The equation of the

sphere that encompasses the hole can be written as 0 = (X − Xh )2 + (Y −Yh )2 + (Z − Zh )2 − R2h = X 2 +Y 2 + Z 2 − 2Xh X − 2YhY − 2Zh Z + Xh2 +Yh2 + Zh2 − R2h . The quadric equation of this sphere is   1 0 0 −Xh  0  1 0 −Yh . Q=  0  0 1 −Zh −Xh −Yh −Zh (Xh2 +Yh2 + Zh2 − R2h ) 32

(3.11) (3.12)

(3.13)

Under transformation M that transforms the points in P 3 to points in P 2 , a quadric Q is transferred to a conic C according to the following formula [54]: C−1 = MQ−1 M ⊤ .

(3.14)

When a camera is used, the transformation M is actually the same as the camera matrix. The camera matrix and the intrinsic and extrinsic camera matrices are presented in (A.3)-(A.7) in Appendix A. From (3.3.3), the inverse of the quadric matrix is found as follows:  2  Rh − Xh2 −Xh Yh −Xh Zh −Xh 1  −Xh Yh R2h −Yh 2 −Yh Zh −Yh  . Q−1 = 2  Rh  −Xh Zh −Yh Zh R2h − Zh 2 −Zh  −Xh −Yh −Zh −1

(3.15)

From (3.14), (3.15), and the camera matrix brought in (A.7), we will derive the inverse of C

matrix:  fC2 (R2h − Xh 2 ) − fC2 Xh Yh − fC Xh Zh 1 fC2 (R2h −Yh 2 ) − fC Yh Zh  . C−1 = 2  − fC2 Xh Yh Rh − fC Xh Zh − fC Yh Zh R2h − Zh 2 

The conic matrix C can be found by calculating the matrix inverse from (3.16):  2  Yh + Zh 2 − R2h Xh Yh Xh Zh − 2 −   fC fC2 fC    2 + Z 2 − R2  1 X X Y Y Z h h h h h h  . h C= 2 − 2 −   2 2 2 Xh +Yh + Zh − Rh  fC  fC fC2   Xh Zh Yh Zh − − Xh 2 +Yh 2 − R2h fC fC

(3.16)

(3.17)

Hence, the conic equation in the image plane is as follows:

(Yh2 + Zh2 − R2h ) 2 Xh2 + Zh2 − R2h 2 XhYh Xh Zh Yh Zh x + y − 2 2 xy − 2 x−2 y + Xh2 +Yh2 − R2h = 0, (3.18) 2 2 fC fC fC fC fC where p = [x y 1]⊤ is a point on the perimeter of the ellipse. Note that this is the projection of P = [X Y Z 1]⊤ on the perimeter of the hole feature in P 3 transferred to the image plane P 2 33

according to the camera projection model p = MP. So far, we have found the conic of the projected ellipse. In order to have resolution, the smallest diameter of the ellipse must be at least equal to a predefined value. The principle axes of the ellipse are found using linear algebra algorithms. Let the conic matrix be   CT d C= , d e

(3.19)

where CT is the top left 2 × 2 matrix, d is a 2 × 1 vector, and e is a scalar. One can find the canonical form of the ellipse by calculating the rotation and translation required to transform the canonical ellipse to the original ellipse. Let the canonical ellipse have a conic matrix of CC . If the transformation that transforms the canonical conic to the original conic is called H, the original conic is found from C = H −⊤CC H −1 or it can be rewritten as CC = H ⊤CH. The transformation H can be expressed as multiplication of two matrices, H = HR HT , where HR is the rotation matrix and HT is the translation matrix. Rotation is an eigen problem, i.e., CT can be diagonalized in the form of CT = RΛD R⊤ , where ΛD is a diagonal matrix. The rotation matrix is found to be 

 R 0 HR = ⊤ . 0 1

(3.20)

The rotation matrix in (3.20) rotates the points on an ellipse A, which is aligned with the canonical ellipse, i.e., p = HR pA . The conic equation can be shown as ⊤ p⊤Cp = p⊤ A HR CHR pA

= p⊤ A CA pA = 0, where CA =

HR⊤CHR

 ΛD R⊤ d = ⊤ . d R e 

34

(3.21)

The canonical ellipse is translated to ellipse A according to the following equation: 

 I t pA = HT pC = ⊤ p 0 1 C The conic form of the ellipses is then calculated as follows: ⊤ ⊤ ⊤ p⊤ A CA pA = pC HT CA HT pC = pC CC pC = 0,

where CC = HT⊤CA HT . The canonical conic is found to be CC =



I

t⊤

       0 ΛD a I t ΛD ΛD t + a ΛD 0 = ⊤ = ⊤ , 1 a⊤ α 0⊤ 1 t ΛD + a⊤ t ⊤ΛD t + 2at + d 0 c

−1 ⊤ where t = −Λ−1 D a = −ΛD R d.

The resolution constraint for a hole feature can be defined as min(λk ) ≥ ω,

(3.22)

where λk is the eigen value of CT for k = 1, 2 (diagonal elements of ΛD ) and ω is the smallest length that the ellipse can be resolved on the image plane.

3.4 Design for Visual Measures 3.4.1 Problem statement The problem of optimal design for visual measures is stated as the optimal design of features (feature point locations and feature characteristics) in the task space, in order to maximize the field of view, focus, and resolution of features for a given 6D camera trajectory profile Γ and fixed camera intrinsic parameters ν . A multi-objective optimization framework is adopted, where for each discrete point on Γ, three different measures for each feature is determined. The vector of 35

design variable (point in design space) T χ ∈ R4N is defined as the 3D positions of N feature points in the task frame (3N elements) and one characteristic for each of N features, resulting a vector of 4N elements. The problem is posed as ΛFOV (T χ ) Λ FOC (T χ ) Λ RES (T χ )] max [Λ Tχ

(3.23)

where Λ TY P (T χ ) = [ΛTY P,1(T χ ) ΛTY P,2 (T χ ) . . . ΛTY P,L (T χ )]⊤, TY P ∈ {FOV, FOC, RES}, L is the number of discrete points on Γ = {γγ1 , . . . ,γγL }, the trajectory of the camera frame expressed in the task frame, and γ k = [xCk yCk zCk θxC θyC θzC ]⊤ is the pose of the camera at instant k.

3.4.2 Multi-objective optimization As it can be seen from (3.23), at each γ k , the problem is already a multi-objective problem with 3 objectives. Having L different points on the trajectory, there exists 3 × L objectives to find an optimum solution. Further analysis of the objectives reveals that the objective function in (3.23) can be decoupled in two naturally different objective functions, since the effective design variables can be separated. This is also discussed in form of functional requirements and design parameters in Chapter 2. Decoupling leads to an optimal design solution, as well as reducing the order of the multi-objective optimization problem. In order to solve the multi-objective optimization problem in (3.23), one can combine the minmax method and the weighted sum strategy accordingly. It would be a reasonable assumption to consider equal weights for objective functions that have the same nature. Specifically, let L

gTY P (T χ ) = ∑ ΛTY P,i (T χ ), i=1

where gTY P (T χ ) is a scalar and TY P ∈ {FOV, FOC, RES}. 36

(3.24)

Having done so, the multi-objective optimization problem with 3 × L objectives is reduced to only 3 objectives: max [gFOV (T χ ) gFOC (T χ ) gRES (T χ )]. Tχ

(3.25)

The type of the remaining objectives are naturally different and they cannot be easily used in a weighted sum strategy, unless great care is taken in assigning adaptive weights. To avoid further complications, one can attain a max-min strategy, where the minimum of the objective functions is maximized:  max min gFOV (T χ ), gFOC (T χ ), gRES (T χ ) . Tχ

(3.26)

3.5 Summary Three different visual measures have been formulated for the optimal design of features. The first visual measure is the field-of-view and is very critical when the motion of the camera frame is subject to large changes in the viewing direction. The second visual measure is the focus of feature points. Only one point will have the exact focus, but the depth-of-field concept has been used in formulations to supply a feasible range for the location of the feature points. The third measure is the pixel resolution. Besides the pixel resolution measure that has been formulated to suit our problem, a new pixel resolution measure has been developed for spherical features that map to an ellipse on the image plane. The three different visual measures are calculated along a known camera trajectory using a weighted-sum strategy to attain the objective functions to be used in a multi-objective optimization formalism. Having the CAD model of an object and the trajectory, the design parameters are optimized in an off-line fashion. Simulation results will be presented later in Chapter 5.

37

Chapter 4 Feature Design for Optimum Directional Motion Resolvability 4.1 Introduction In this chapter, we approach the problem of feature design from a totally different perspective, in which an optimal solution is sought to improve the sensitivity to motion resolvability. Mechanical design is an under-constrained problem and relevant design constraints often leave the designer with some degree of flexibility in designing features, e.g., choosing feature locations, such as the coordinates of corners. Such a design has not been previously optimized for directional motion resolvability. Having the intrinsic camera calibration parameters, the motion profile, and the image Jacobian matrix, a directional camera motion resolvability measure is defined and optimized within a constrained multi-objective optimization framework. For each location on the camera trajectory, where the pose of the camera and its time derivative (velocity screw) are available, the measure is evaluated as a separate objective. The goal is to find the Pareto optimal design, i.e., to find the location of feature points in task frame such that the measure is maximized for all points on the trajectory. Solving a multi-objective optimization problem within a single-objective framework by combining different objectives into one objective function would not always result in an optimal 38

solution. The main reason for this statement is that local gradients of this objective function may result in many local maxima, rendering the gradient-based methods useless. Instead, we successfully utilize a constrained multi-objective approach to solve the under-constrained design problem. Simulation results show that the motion of the camera is better resolved for the optimally designed object. In order to mathematically state the problem, some basic concepts should be reviewed first. These concepts include the image Jacobian matrix, vision resolvability, motion perceptibility, and directional dexterity.

4.1.1 Image Jacobian and image singularity Image Jacobian is a matrix that relates the velocity of the task frame (or camera frame) to the velocity of the projections of feature points onto the sensor plane. The early notion of the image Jacobian was the sensitivity matrix as first introduced by Weiss et al. [55]. Feddema et al. [22] provided a detailed derivation of the image Jacobian and further studied a practical method to find the inverse of the image Jacobian when three feature points in the camera frame are projected onto the image plane (in this case the image Jacobian is a 6 × 6 matrix). The inverse image Jacobian for real-time vision-guided control of a manipulator. Another notion of the image Jacobian is the interaction matrix as used by Espiau et al. [4]. For the sake of integrity throughout this paper, we define the image Jacobian as follows. Consider N feature points expressed in the task frame {T } as T

P j = [T X j T Y j T Z j ]⊤ , j = 1 . . .N.

(4.1)

We assume that the feature points are rigidly attached to the task frame, therefore T

P˙ j = 0. 39

(4.2)

The Cartesian coordinates of feature points are expressed as C Pkj = [C X jk CY jk C Z kj ]⊤ in the camera frame {C}. In our problem, the motion of the camera frame is known, i.e., at time instant k, the linear velocity of the camera frame, C VCk = [x˙Ck y˙Ck z˙Ck ]⊤ , and the absolute angular velocity of the camera frame CΩCk = [ωkxC ωkyC ωkzC ]⊤ = [θ˙ xC θ˙ yC θ˙ kzC ]⊤ (expressed in the current camera frame) are known. The velocity equation of the feature point C Pkj can be written as C ˙k Pj

= − C VCk − CΩCk × C Pkj .

(4.3)

Manipulating (4.3) leads to C

X˙ jk = − x˙Ck − C Z kj ωkyC + CY jk ωkzC ,

C ˙k Yj

= −y˙Ck + C Z kj ωkxC − C X jk ωkzC ,

C ˙k Zj

= −˙zCk − CY jk ωkxC + CY jk ωkyC .

(4.4)

The projection of the feature point from the camera frame {C}, onto the image frame {S}, can be expressed as S k xj

S k yj

= =

fC C X jk sx C Z kj fC CY jk sy C Z kj

,

(4.5)

,

(4.6)

where fC is the focal length of the camera, and sx and sy are the horizontal and vertical dimensions of the pixels, respectively (see Fig. 4.1). Derivation of (4.5) and (4.6) results in S k x˙ j

=

S k y˙ j

=

fC C X˙ jk sx C Z kj fC CY˙ jk sy C Z kj 40

− −

C Z˙ k j S k x˙ j C k Zj CZ ˙ kj S k y˙ j C k Zj

,

(4.7)

.

(4.8)

Figure 4.1: Illustration of the feature point expression in camera frame, task frame, and the projection onto the sensor plane (image plane).

Combining (4.4), (4.7), and (4.8), the equation of the image Jacobian for a single feature point can be written as

S k  h x˙ j JkP, j | JkO, j S y˙k = j

41

 x˙Ck  y˙Ck   i  z˙k   Ck  , ωx   C ωk  yC ωkzC 

(4.9)

where

and



fC 0 − C k  sx Z j k  JP, j =  fC  0 − C k sy Z j

fC C X jk



 sx ( C Z kj )2  , fC CY jk  

(4.10)

sy ( C Z kj )2



C kC k CX k fC X j Y j fC j − 1 + ( C k )2  k C 2  sx ( Z j ) sx Zj ! JkO, j =  CY k C X k CY k f fC j j j  C − 1 + ( C k )2 k C 2 sy sy ( Z j ) Zj

!

 C k fC Y j  sx C Z kj   C k . fC X j   − k C sy Z j

(4.11)

The equation of the image Jacobian can also be represented in a compact form S

p˙ j =

Jkj (Φ)

 VCk CΩ k , C

C

(4.12)

where S p˙ kj = [S x˙kj S y˙kj ]⊤ is the projected coordinates of the jth feature point onto the sensor frame and Jkj (Φ) is the image Jacobian of the jth feature point with Φ = ( fC , sx , sy ,C X jk ,C Y jk ,C Z kj ). The image Jacobian of a single feature point is a function of the camera intrinsic parameters and the expression of the feature point in the camera frame. It is worthy to note that this Jacobian is a 2 × 6 matrix that relates the velocity of the camera frame to the rate of change in the sensor frame for each feature point. A similar approach may be taken to relate the rate of change in the task frame to the rate of change in the sensor frame [56]. However, expressing a point and its time derivative in the camera frame is preferable for eye-in-hand configurations. We shall use an eye-in-hand configuration and hence, it is desirable to have the final formulation in camera frame. If N feature points are available, (4.12) would generalize to   k   Sp J1 (Φ) ˙1  S p˙   Jk (Φ)   C k   V  2  2  ..  =  ..  C Ck .  .   .  ΩC Sp ˙N J kN (Φ) 42

(4.13)

The image Jacobian matrix for all points is then defined as  k  J1 (Φ)  Jk (Φ)   2  k J I (Φ) =  ..  .  .  JkN (Φ)

(4.14)

Let S P˙ = [S p˙ 1 S p˙ 2 . . . S p˙ N ]⊤ and γ˙ k = [ C VCk CΩCk ]⊤ be the velocity screw, then (4.13) and (4.14) leads to P = JkI (Φ) γ˙ k .



(4.15)

At least three feature points and their projections must be available to form a matrix that is not rank deficient. However, having three feature points is not sufficient for rank deficiency of the image Jacobian matrix. The image singularity refers to the situation when JkI (Φ) becomes singular. As mentioned above, N must be greater or equal to three, otherwise we would definitely face image singularity. Deng [5] has studied the previous efforts to investigate the image singularity in all forms and deducted the following theorem: Theorem I: The image singularity does not exist when using the redundant (N ≥ 4) and nondegenerated image feature point measurements [5]. There are three situations that lead to singularity. First, Michel and Rives [57] pointed out that any three points that form a degenerate triangle results in image singularity. Second, they pointed out that for three points, the image singularity occurs when the camera optical centre is on a cylinder that passes through those points and its axis is along the normal of the plane of those three points. Third, if the depth of the features with respect to the camera frame is equal, and at least one of the feature points projects to the origin of the sensor plane, then image singularity occurs. Further details can be found in [5]. 43

The notion of image singularity is extremely important in visual servoing and vision-guided manipulation. If a configuration results in image singularity, an image-based control strategy would fail, because the overall task Jacobian would become singular. We shall use the image Jacobian as defined in (4.14) to derive a directional measure and optimize the 3D position of the feature point in the task frame. Any measure that is derived from the image Jacobian is, in fact, a function of Φ. We shall elaborate more on this when explaining the optimization process. Suppose that the camera frame moves along a known path Γ. By a known path, we mean that the homogenous transformation that maps the task frame {T }, to the camera frame at instant k denoted by {C}, is known. This assumption is made as the task frame is not moving and can be considered as the world frame. We can define Γ = {γγ1 . . . γ L }, where γ k is a vector containing the position and orientation of the camera frame at instant k. We shall use this known path to derive a directional measure later in section 4.2.2.

4.1.2 Manipulability ellipsoid In this section, we shall first review the classic concept of manipulability ellipsoids [58]. Then, we shall investigate the related dexterity measures that we will later base our proposed measure. If the pose of the end-effector in the base frame {B} is represented by B Pe , we can write the manipulator velocity equation as B˙

Pe = J(q)˙q,

(4.16)

where J denotes the geometric Jacobian and q denotes the vector that contains the joint variables in C-space (configuration space). Consider the set of end-effector velocities that yields q˙ ⊤ q˙ ≤ 1.

44

(4.17)

Such velocities realize the following equation B ˙ ⊤ †⊤ † B ˙ Pe J J Pe

≤ 1,

(4.18)

where J† is the pseudo-inverse matrix of J. That is, J† = (J⊤ J)−1 J ⊤ .

(4.19)

Dexterity of a manipulator has been extensively studied in the past. A survey of different dexterity measures (all based on the geometric Jacobian of the manipulator) is given by Kim and Khosla [59]. Here, we provide a brief overview of the existing dexterity measures. The manipulability ellipsoid is determined by the eigen vectors of JJ⊤ , where J is the geometric Jacobian matrix of the manipulator. The manipulability measure or simply manipulability is defined as wm =

q

det (JJ⊤ ) = σ1 σ2 . . . σn ,

(4.20)

where σ1 ≥ σ2 ≥ . . . ≥ σn are the singular values of the Jacobian matrix that has a rank of n. This measure is proportional to the volume of the manipulability ellipsoid. Since, calculation of the kinematic and dynamic singularities are often computationally expensive, the manipulability ellipsoid provided a region-of-interest within which the geometric Jacobian is guaranteed to be full rank. This has also led to definition of the task-specific ellipsoids as will be discussed in the next subsections. The condition number of the Jacobian matrix can also serve as a dexterity measure: cJ =

σ1 . σn

(4.21)

However, the condition number is usually used as a measure to detect proximity to singularities and also as a measure of kinematic accuracy [59]. 45

Last but not least, the minimum singular value, σn , is sometimes used as a dexterity measure. This measure represents the smallest velocity transmission ratio [59, 60].

4.1.3 Direction dexterity measure If the task path is known, a directional dexterity measure involving the whole trajectory must be used. Zlajpah [60] has introduced a direction dexterity measure that can be used along a specific task. Let the known path be a parametric function B

Pe = f(b),

(4.22)

where f (·) is the n-dimensional vector function and b is the scalar parameter. The path direction, D, is given by D(b) =

BP ˙

e

kB P˙ e k

=

df/db , kdf/dbk

(4.23)

where k · k is the L2 -norm. Let a = kB P˙ e k, which yields B P˙ e = aD. The ellipsoid can be rewritten as aD⊤ J†⊤ J† Da ≤ 1.

(4.24)

The direction dexterity measure is then defined as v=

a =q k˙qk

1 D⊤ (J†⊤ J† )D

=

1 kJ† Dk

.

(4.25)

4.1.4 Vision resolvability and motion perceptibility One of the underlying problems in 3D robot vision is the ability to determine the 3D motion of parts from the 2D motion of the image features. This has been extensively studied in the past 15 years. Nelson and Khosla [61, 56] introduced the vision resolvability (also known as motion

46

resolvability) ellipsoids from the Singular Value Decomposition (SVD) of the image Jacobian that was defined in (4.14). They have also introduced a similar force resolvability ellipsoid when both force feedback and vision feedback is available [62]. The motion resolvability ellipsoid basically shows the directions that the relative motion of an object, with respect to the camera, are resolvable. If an ellipsoid is stretched in one direction, it means that in the other two perpendicular directions, the relative motion is less resolvable. A near-spherical ellipsoid is desirable in omnidirectional motion applications. A 6D ellipsoid cannot be geometrically represented. Furthermore, the 6D space is non-homogenous because it contains both the position and orientation compartments. It is worth separating the image Jacobian into two 2 × 3 matrices as brought in (4.10) and (4.11), such that two homogenous and geometrically ellipsoids are resulted. Sharma and Hutchinson [63] introduced the scalar and quantifiable motion perceptibility measure. The basic idea is the same introduced by Nelson and Khosla, but they elaborate more on the derivation of the ellipsoids. Their main contribution is to introduce a composite measure, which is quite simple in nature but also includes the geometric Jacobian as well as the image Jacobian. They claim that being a scalar, the motion perceptibility is easier to deal with in practice. This is true for the first three applications introduced in their paper, namely, optimal camera positioning, active camera trajectory planning, and simultaneous camera/robot trajectory planning. However, they analyze the directional behaviour of their measure when explaining their last example, critical directions for hand/eye motion, which voids the general superiority of a scalar measure.

4.2 Design for Directional Motion Resolvability With the above background on the image Jacobian, vision resolvability, motion perceptibility, and directional dexterity, we have the right tools to mathematically state the problem of optimal design 47

for directional motion resolvability.

4.2.1 Problem statement The problem is stated as the optimal design of feature point locations in the task space, in order to maximize a directional (path-specific) motion resolvability measure, Λ , for a given 6D camera trajectory profile Γ and fixed camera intrinsic parameters ν . A multi-objective optimization framework is adopted, where the objectives are defined for each discrete point on Γ as a locally determined measure. The vector of design variable (point in design space) T χ ∈ R3N is defined as the 3D positions of N feature points in the task frame resulting a vector of 3N elements. The problem is posed as max Λ (T χ ) Tχ

(4.26)

where Λ (T χ ) = [Λ1 (T χ ) Λ2 (T χ ) . . . ΛL (T χ )]⊤, L is the number of discrete points on Γ = {γγ1 , . . . ,γγL }, the trajectory of the camera frame expressed in the task frame, and γ k = [xCk yCk zCk θxC θyC θzC ]⊤ is the pose of the camera at instant k. The orientation angles are chosen as absolute angles to be consistent with (4.3), however, other orientation representations such as Euler angles can be used to define camera pose. The directional motion resolvability measure at each γ k is shown by Λk , obviously a function of the design variable T χ . See Fig. 4.2 for an illustration of the problem for 5 feature points.

4.2.2 Directional motion resolvability measure In order to define the directional motion resolvability Λk , several strategies may be chosen. We introduce a measure similar to the directional dexterity measure reviewed in section 4.1.3, where the end-effector trajectory is constrained to a known path. The exact same strategy cannot be used,

48

since we still have the end-effector trajectory as input, but a closer look at the image Jacobian equation reveals that the motion trajectory in the image space must be available. The resolvability ellipsoid is derived from (4.15) and γ˙ ⊤ k γ˙ k ≤ 1: S ˙⊤

k† S ˙ P (Jk†⊤ I JI ) P ≤ 1.

(4.27)

Let γ k be a parametric function of the trajectory profile of the camera frame motion γ k = g(b),

(4.28)

where g(·) is a 6-dimensional vector function and b is a scalar parameter. Since the feature points project onto the image plane according to (4.5) and (4.6), the motion profile of the camera frame nonlinearly maps to different motion profiles of the image features in the image plane. This implies that S

P = f(b),

(4.29)

where f (·) is the 2 × N-dimensional mapping of the projections, and b is the same scalar parameter in (4.28). Similar to (4.23), the projected path direction, D, is given by D(b) =

SP ˙

k

SP ˙

k

=

df/db . kdf/dbk

(4.30)

Let a = k S P˙ k, which yields S P˙ = aD. The ellipsoid can be rewritten as k† aD⊤ (Jk†⊤ I JI )Da ≤ 1.

(4.31)

This equation is different from (4.24) in that the image Jacobian is used and the direction is defined in the image plane. The proposed directional motion resolvability measure is then defined as 49

Figure 4.2: Illustration of a camera on a known path Γ = {γγ1 ,γγ2 , . . . ,γγL }. The camera frame {C} is different at each γ k for k = 1 . . . L. Consequently, the feature representations in different camera frames, Cχ ’s, and the representation of the features in the task frame, T χ , are each different.

Λk =

a =q kγ˙ k k

1 k† D⊤ Jk†⊤ I JI D

=

1 kJk† I Dk

.

(4.32)

It is should be noted that for a given direction in the Cartesian space the measure is larger for more feature points as k S P˙ k would become larger when the number of feature points increases. This result matches the expectations, as with more feature points, it is more likely that the image Jacobian matrix resolves motion in a broader range of directions.

4.2.3 Multi-objective optimization From the several approaches that can be taken to find the Pareto optimal design variable, we successfully utilized the weighted sum of objectives as the objective function to be maximized. Since elements of Λ(T χ ) have the same modality but differ only because they are computed at different relative pose instances, one can assign equal weights to the different objectives in the weighted

50

sum strategy. Henceforth, the objective function to be maximized is L

H(T χ ) = ∑ Λi (T χ ).

(4.33)

i=1

R The Optimization Toolbox of MATLAB is used to implement the method. For simulation con-

venience, we minimize F(T χ ) = −H(T χ ).

4.3 Analysis In this section, we seek an analytical solution to the SVD of the image Jacobian and we shall demonstrate the complexity evaluating its singular values even in its simple forms. Suppose that fC = 3.5mm and sx = sy = 1.0007 (See Appendix A for camera calibration process). Without loss of generality, we can neglect the orientation compartment of the image Jacobian and try its position compartment (Eq. 4.10), which evaluates to   1 X1 0 −  Z1 Z1 2  , JP,1 = −3.49755   1 Y1  0 − 2 Z1 Z1

where the position of a single feature point, expressed in the camera frame, is assumed as

P1 = [X1 Y1 Z1 ]⊤ . The singular values of the above matrix are the square root of the eigen values of JP,1 JP,1 ⊤ . We have utilized Maple 9.5 to evaluate the singular values. Since, Maple was incapable of evaluating the parametric singular values, we used a step-by-step approach to derive the singular values. First, we find that 

  JP,1 JP,1 ⊤ = 12.2329  

1 X1 2 + Z1 2 Z1 4 X1 Y1 Z1 4 51

X1 Y1 Z1 4 1 Y1 2 + Z1 2 Z1 4



  . 

The eigen values of the above matrix are the roots of det(JP,1J P,1⊤ − kI) = 0, solving for k:    1 Y1 2 X1 Y1 2 + −k − = 0 Z1 2 Z1 4 Z1 4    X12 + Y12 + Z12 1 k− k− 2 = 0. Z14 Z1 q X12 + Y12 + Z12 1 . As much as X1 and Y1 The two singular values are then found to be and Z1 Z12 get closer to zero, the two singular values get closer to each other and the motion resolvability 

1 X1 2 + −k Z1 2 Z1 4



ellipse is closer to a circle. In other words, if a feature point is aligned on the frame origin, the vision resolvability ellipsoid is in fact a circle and motion is equally resolvable in both directions. The above analytical solution does not simply generalize to more feature points. Suppose that two feature points are available in Cartesian space denoted by P1 = [X1 Y1 Z1 ]⊤ and P2 = [X2 Y2 Z2 ]⊤ . These two points are expressed in camera frame. The position part of the image Jacobian matrix is shown as follows: 

JP =



Accordingly,

J P,1 J P,2

1  Z1    0    = −3.49755   1   Z2   0



1 1 0  Z 2+Z 2 2  1  1 1  ⊤ 0 + J P JP = 12.2329  Z1 2 Z2 2    X1 X2 Y1 Y2 − 3− 3 − 3− 3 Z1 Z2 Z1 Z2 52

0 1 Z1 0 1 Z2

X1 − 2 Z1 Y1 − 2 Z1 X2 − 2 Z2 Y2 − 2 Z2



      .     

X1 X2 − 3− 3 Z1 Z2 Y1 Y2 − 3− 3 Z1 Z2 2 2 X1 +Y1 X2 2 +Y2 2 + Z1 4 Z2 4



    .   

Similar to the previous analytical solution, the square root of the solution of det(JP ⊤ JP − kI) = 0 for k, gives the singular values of JP . However the three solutions have the following complex forms: Z2 2 + Z1 2 , Z1 2 Z2 2 √ . 4 4 1 2 4 2 4 4 2 4 2 4 2 2 4 (Z1 Z2 +Y1 Z2 + Z1 Z2 + Z1 X2 + Z1 Y2 + X1 Z2 ± ∆) (Z1 Z2 ), 2

where ∆ = (Z1 8 Z2 4 + Z1 8 Y2 4 + Z1 8 X2 4 +Y1 4 Z2 8 + X1 4 Z2 8 − 2 Z1 6 Z2 4 Y2 2 − 2 Z1 6 Z2 4 X2 2 − 2 Z1 4 Z2 6 Y1 2 1 − 2 Z1 4 Z2 6 X1 2 + 2 Z1 8 Z2 2 X2 2 + 2 Z1 8 Z2 2 Y2 2 + 2 Z1 2 Z2 8 Y1 2 + 2 Z1 2 Z2 8 X1 2 + 2 Z1 8 Y2 2 X2 2 + 2Y1 2 Z2 8 X1 2 + 2 Z1 4 Y2 2 Y1 2 Z2 4 + Z1 4 Z2 8 + 2 Z1 6 Z2 6 + 8 Z1 5 Z2 5 X1 X2 + 8 Z1 5 Z2 5 Y1 Y2 + 2 Z1 4 Y2 2 X1 2 Z2 4 + 2 Z1 4 X2 2 Y1 2 Z2 4 + 2 Z1 4 X2 2 X1 2 Z2 4 ). The above analysis shows that even for two feature points and only considering the position part of the image Jacobian, the analytical solution is very complicated to derive.

4.4 Summary A new directional relative motion resolvability measure was introduced to improve the design of objects that are used in vision-guided manipulation. The measure is found for discrete points on a camera trajectory given that the camera trajectory, its intrinsic parameters, and the image Jacobian matrix are known. The motion of the end-effector in the Cartesian space is translated to motion of the image features on the image plane. The direction of the latter motion is used in the proposed measure. A bounded multi-objective optimization approach is successfully utilized to find the Pareto optimal design variable. Without loss of generality, the simulation results will be performed for a simple case of one-dimensional optimization (See Chapter 5). 53

It was shown that even for two feature points, the singular values of the position part of the image Jacobian are complicated to derive. Therefore, the analytic form of the singular values cannot be easily used in the formulation of the directional motion resolvability measure.

54

Chapter 5 Simulation Results 5.1 Introduction In this chapter, the simulation results are demonstrated to solidify our presented theory of optimal feature design for vision-guided manipulation. This chapter is organized as follows. The object models are described in Section 5.2. The relative camera-to-object 6-dimensional trajectories are provided in Section 5.3. The results are shown in Section 5.4, where the GUI is introduced first followed by the evaluation of some of the measures. The results of design for directional motion resolvability measure (see Chapter 4) and the design for combination with visual measures (see Chapter 3) are brought in Sections 5.4.3 and 5.4.4. The simulation results are discussed in 5.5. Finally, the chapter is summarized in Section 5.6.

5.2 Object Models Many object models have been developed for simulation purposes. The feature patterns are motivated by the form features introduced in [33]. Each object model is characterized by feature parameters. For example, if a hole feature is considered, the radius of that hole can be taken as

55

a parameter. During the optimization process, the variable parameters of the features, such as their location with respect to the object frame and other feature parameters are found such that the measures introduced in the preceding chapters are optimized. Two classes of object characteristics are considered in this work: (a) 3D locations of the feature points and their 2D projections on the image plane, and (b) circular hole features alongside their first-order moments and their ellipse projections on the image plane.

5.2.1 Objects with 3D features points The first class consists of only feature points, i.e., the 3D location of feature points. This class is used to evaluate the field of view, focus, and directional motion resolvability measures. The three object models in this class are shown in Fig. 5.1. The first model constitutes of four coplanar feature points that form a square as shown in Fig. 5.1 (a). The variable that is changed during the optimization is the side of the square. It is assumed that the side of the square is bounded. The second object model again contains four coplanar feature points. However, the fourth feature point is twice farther than the previous model as depicted in Fig. 5.1 (b). Finally, the third model consists of eight non-coplanar feature points distributed on the vertices of a cube (Fig. 5.1 (c)). The design variable to be modified throughout the optimization process is the length of the edge of the cube.

5.2.2 Objects with circular holes The second class of objects consists of hole features, which are modeled by circles in the task frame. This class is used to evaluate the pixel resolution measure in conjunction with the previously mentioned measures. In this work, the centres of holes are taken as the feature point associated with the hole. This is also known as the first-order moment of the hole feature. Higher order moments are potentially better candidates to work with as they are robust to image processing 56

(a)

(b)

(c)

Figure 5.1: Object models (a) four coplanar points on a square (b) four coplanar points not on a square, and (c) eight spatial points that form a cube.

noise while being extracted; however, they create a higher computational burden. Single hole The simplest form of a single hole feature and its projection onto the image plane are shown in Fig. 5.2 (a) and (b). The radius of the hole is chosen as 3.70 mm. The camera frame is positioned at [0, 0, 28]⊤ mm. The object frame is positioned at the centre of the world frame. Multiple holes Another object model that has been developed is a set of four hole features with equal radii, three of which are coplanar (Fig. 5.3). The distances of the centres of the holes from each other and the radii can be changed to obtain different objects. For simplicity, all radii are assumed to be equal and the distances are parameterized with one variable. The illustrated object has a distance parameter d = 6.79 mm and radius parameter Ro = 3.10 mm. The camera frame is positioned at [0, 0, 32]⊤ and the object frame is positioned at [0, 2, − 28]⊤ .

57

−50

20 15 0

10

Z

5 0 50

−5 −10 −15

100

−20 30 20

30

10

20 0

10 −10

−20 Y

150

0

−10 −20 −30

−30

X

−50

(a)

0

50

100

150

(b)

Figure 5.2: Circular hole feature, (a) a single hole feature is shown in 3D Cartesian space. The camera frame (top) and the object frame (bottom) are shown. The axis are in mm. (b) The projection of the hole feature onto the image plane. The 128 × 128 (in pixels) block shows the area of the image sensors. A larger coordinate system is chosen to observe the field-of-view infringements.

Circular pattern of holes Different feature patterns are common in mechanical engineering design [33]. Notably, a circular pattern of hole features appears on many objects in the automotive industry such as front hubs, braking discs, and brake drums [64] (Fig. 5.4). We have also modeled the circular pattern (Fig. 5.5). Different parameters can be adjusted to change the design, such as the radii of the holes and the radius of the centre of smaller circles. In the example shown the radius of the largest hole is 5.0 mm, the radius of the circle on which the centres of the smaller holes are positioned, is 9.65 mm, and the small holes have a radius of 2.5 mm. The camera frame is positioned at [0, 0, 34]⊤ and the object frame is positioned at [0, 2, − 24]⊤ . 58

−50

20 15 0

10

Z

5 0 50

−5 −10 −15

100

−20 30 20

30

10

20 0

10

150

0

−10

−10

−20

−20 −30

Y

−30

X

−50

0

(a)

50

100

150

(b)

Figure 5.3: Multiple circular hole features, (a) and (b).

One-dimensional array Another type of features are one-dimensional arrays [33]. Such feature sets have been modeled and can be selected from the GUI (Fig. 5.6). The holes have been assumed to be equidistant with a distance of 8.57 mm from each other. The radii of the hole features in the shown example is 2.5 mm. The camera frame is again positioned at [0, 0, 34]⊤ and the object frame is also positioned at [0, 2, − 24]⊤ . Two-dimensional arrays have also been modeled and can be selected from the GUI as seen in Fig. 5.8.

59

(a)

(b)

(c)

Figure 5.4: Motivation beyond modeling circular patterns: different views of (a) a front hub ,(b) a front braking disc, and (c) a rear brake drum [64].

5.3 Relative Camera-to-Object Trajectory The optimized value of the design parameter depends on the choice of the camera frame trajectory. One can assume that the camera frame is constant and the task frame is moving relative to the camera frame. In Fig. 5.7 (a)-(c) the relative pose trajectory and the corresponding feature point trajectories on the image plane are shown. The direction of the mentioned trajectories are shown in Fig. 5.7 (d)-(f). Three other trajectories are also considered. The trajectories can be easily selected using a pop-up menu in the developed GUI. When designing a new trajectory, it is very important not to use large ranges of angles as the features may easily fall outside the field-of-view of the camera. Also, note that some trajectories may not result in any feasible design, since at any point in the design space, one or more of the constraints are not satisfied.

60

−50

20 15 0

10

Z

5 0 50

−5 −10 −15

100

−20 30 20

30

10

20 0

10 −10

−20 Y

150

0

−10 −20 −30

−30

X

−50

(a)

0

50

100

150

(b)

Figure 5.5: Circular pattern of hole features, (a) 3D view with camera and object frames and (b) 2D image of the pattern.

5.4 Results 5.4.1 Graphical user interface R A MATLAB graphical user interface (GUI) application is developed to ease the demonstration

of simulations. A snapshot of the developed GUI is shown in Fig. 5.8. The object model can be selected with a pop-up menu. A two-dimensional array is chosen in the above snapshot. The trajectory of the camera frame can be chosen with another pop-up menu from the pool of the programmed trajectories. The trajectory of the camera expressed in the world frame depends on the pose of the camera frame, which can be selected with six slide bars seen on the right side of the GUI. The object frame can also be adjusted using another set of six slide bars. Two object parameters have been assumed for the objects. These two parameters can be adjusted using

61

−50

20 15 0

10

Z

5 0 50

−5 −10 −15

100

−20 30 20

30

10

20 0

10 −10

−20 Y

150

0

−10 −20 −30

−30

X

−50

0

(a)

50

100

150

(b)

Figure 5.6: Multiple circular hole features, (a) and (b).

two slide bars. Finally, the desired measures can be selected with check boxes. The optimization code runs for the selected measures and only if a trajectory is selected. The initial value for the design variable is read from the object parameter slide-bars. If a trajectory is not selected, the measure would be calculated for one pose of the camera only without the optimization.

5.4.2 Evaluation of the field-of-view measure To evaluate the field-of-view measure using (3.3), an object with eight feature points is considered as shown in Fig. 5.9 (a). The object frame is positioned at [0, 0, 0]⊤ and rotated with yaw angle equal to 0.377 rad. In Fig. 5.9 (b), the image of the object is shown when the camera is positioned at [20, 0, 128]⊤ with zero angles. As it can be seen, three of the feature points fall outside the field-of-view. These feature points are indexed 1, 2, and 8 in Table 5.1. In Fig. 5.9 (c), the image

62

X

−40 −60 −80

120

3

100

2 1 0

Pitch (θ)

−1

−100

Z

−120 −140 0

10

Roll (ρ)

−2

20

30

40

50

Yaw (ψ)

−3 0

10

20

(a)

Orientation direction [rad/frame]

Position direction [mm/frame]

1 0.5

δX=δY

−1

−2

−2.5 0

δ

Z

20

5

10

15

20

25

30

40

50

δ

θ

δ

0

ρ

−0.1

−0.2 0

δψ

10

30

35

40

(c)

0.05

20

L

(d)

20

L

−0.15

10

40

50

−0.05

−0.5

−1.5

40

60

(b)

1.5

0

30

80

L

L

30

40

50

Image feature direction [pixel/frame]

−20

4

Image Feature Trajectory [pixels]

Y

Relative Orientation Trajectory [rad]

Relative Postion Trajectory [mm]

0

L

(e)

10

5

0

−5 0

5

10

15

20

25

30

35

40

45

L

(f)

Figure 5.7: (a) Relative position of the camera frame and the task frame, (b) relative orientation of the camera frame and the task frame; the Euler angles are shown since they were used in simulations for convenience, (c) projections of the eight feature points on the image plane, (d) Direction of relative position change, (e) direction of relative pose change, and (f) direction of feature points moving on the image plane due to the motion of the camera. Coordinate L is the index of the camera frame on Γ.

of the same object is shown when the camera frame is positioned at [0, 0, 128]⊤. All of the feature points fall inside the field-of-view. This is also verified in Table 5.1.

5.4.3 Design for directional motion resolvability The three object models in Fig. 5.1 were used to evaluate the effectiveness of the proposed directional motion resolvability measure. The design variable T χ is related to the length of the edge d,

63

Figure 5.8: Snapshot of the developed GUI

where 2mm ≤ d ≤ 6mm. For example, Fig. 5.1(c) has the following eight feature points:   d d d d d −d     d −d d     1 d −d −d T  χ=   −d d d 2   −d d −d    −d −d d  −d −d −d

(5.1)

In terms of the trajectory and its parametrization, as it is pointed out in (4.28), the camera

trajectory would map to a corresponding trajectory of the feature points onto the image plane as shown in (4.29). In practice, the optimization process is based on the latter. This trajectory is shown in Fig. 5.7 (c) and (f). 64

−50

−50

0

0

50

50

100

100

150

150

20 15 10

Z

5 0 −5 −10 −15 −20 30 20

30

10

20 0

10 0

−10

−10

−20 Y

−20 −30

−30

X

−50

0

(a)

50

(b)

100

150

−50

0

50

100

150

(c)

Figure 5.9: Evaluation of the field-of-view measure, (a) an industrial object with eight feature points is shown, (b) image of the object when the three right most feature points fall outside of the field-of-view, and (c) image of the same object with all the feature points inside the field-of-view.

In Fig. 5.1 (a), the four feature points are coplanar and form a square. The objective function F(T χ ) is minimized at the lower bound of d, i.e., at d = 2mm. For the definition of the objective function refer to Section 4.2.3 and (4.33). Fig. 5.10 (a) shows the objective function for this test case. It should be noted that the image Jacobian tends to become ill-conditioned when the feature points get closer to each other. As a result, F(T χ ) becomes smaller (its absolute value becomes larger), and hence the odd results. One conclusion that can be drawn is that the configuration of the feature points in the space is very important for the success of a vision system that uses the pseudo-inverse of the image Jacobian. With the second studied object (Fig. 5.1 (b)), the four coplanar feature points are not on a square. This case is more interesting to study as the asymmetry prevents the ill-conditioning of the image Jacobian that may occur throughout the trajectory. As seen in Fig. 5.10 (b), the minimum value of the objective function depends on the range of d. For example, if 2mm ≤ d ≤ 3mm, the

65

Table 5.1: Evaluation of the field-of-view measures for Fig. 5.9.

Feature Index 1 2 3 4 5 6 7 8

Fig. 5.9 (b) -0.0320 -0.0101 0.0075 0.0283 0.0290 0.0316 0.0338 -0.0058

Fig. 5.9 (c) 0.0093 0.0203 0.0295 0.0295 0.0203 0.0093 0.0251 0.0251

minimum value would be 2mm. However, if 2mm ≤ d ≤ 6mm, then the objective function would be minimized at d = 6mm. This result outlines the importance of feature point design in vision-guided manipulation. If during the camera motion, the images of the feature points on the image plane fall near to each other or occlude each other, an ill-conditioned image Jacobian matrix is obtained. For example, if the viewing axis of the camera is in the plane of the feature points, motion would be less resolvable in the direction of the camera view axis. Since the camera trajectory is given as input, if somewhere on the trajectory the camera moves in a direction where the motion is not supposed to be resolvable, the local objective function of that pose is not desirable and it would ruin the weighted sum of objectives. With the third studied object Fig. (5.1 (c)), eight feature points are distributed in 3D space rather than on a plane. It is seen that the objective function has a quasi-linear behaviour (Fig. 5.10 (c)). This suggests that with the increase of value of the design parameter, a better design can be obtained. That is, given the sample trajectory, when feature points are designed farther from each other, motion is directionally better resolved.

66

5

−1

5

x 10

−1.4

4

x 10

−2.24

x 10

−1.42

−1.02

−2.26 −1.44

−1.04

−2.28

−1.46

F( χ)

F(Tχ)

F(Tχ)

−1.06

T

−1.48

−1.08

−2.3

−1.5 −2.32

−1.1

−1.52

−1.12 −1.14 2

−2.34

−1.54

2.5

3

3.5

4

4.5

5

5.5

6

−1.56 2

2.5

3

3.5

d [mm]

4

d [mm]

(a)

4.5

5

5.5

6

−2.36 2

(b)

2.5

3

3.5

4

d [mm]

4.5

5

5.5

(c)

Figure 5.10: Objective functions versus the design parameter. (a) First case, object model of Fig. 5.1(a), (b) second case, object model of Fig. 5.1(b), and (c) third case, object model of Fig. 5.1(c) .

5.4.4 Design for combined measures In the previous section, the only measure that was considered was the directional motion resolvability. It was seen that for the last object (the cube), the measure was desirably quasi-linear with changes in the design parameter. In this section, we add two more visual measures, namely the field-of-view and the focus, to recalculate the optimized design. Two changes have been made with respect to Section 5.4.3: (1) the trajectory of the camera frame is changed slightly, such that the visual measures are not positive (refer to Section 4.2.3 and (4.33) for definition of the measured along the path), and (2) the range of the design variable has been reduced to reach a feasible solution. These two changes cause the graph relating to the directional motion resolvability shown in Fig. 5.11 (a) to be slightly different from Fig. 5.10 (c). The visual measures are illustrated in Fig. 5.11 (b) and (c).

 min max gFOV (T χ ), gDIR (T χ ), gFOC (T χ ) . Tχ

67

(5.2)

6

where gFOV is the objective function for field-of-view measure, gDIR is the objective function for the directional motion resolvability measure, and gFOC is the objective function for the focus measure. Note that in this formulation, the objective functions are the negative of the measures, R because the Optimization Toolbox of MATLAB has powerful built-in functions that find the

minimum. To find the maximum of an objective function, one should simply negate the function. The optimized solution results in d = 1.41 mm as seen in Fig. 5.11.

5.5 Discussion of Results 5.5.1 Directional motion resolvability The results of the four coplanar feature points on a square (first case) are not desired, however, examining the condition number of the image Jacobian matrix reveals that with the used trajectory and the design variable, the image Jacobian is ill-conditioned. This happens because at some time instants, the camera is posed such that the projections of the feature points are very close to each other or some features occlude some others. The results of the four coplanar feature points (second case) shows that with the same trajectory used in the first case, the range of the design parameter plays a role in convergence of the optimization process to the global minimum. The results of the third case, where eight feature points are spatially distributed as vertices of a cube, are as expected. That is, with the increase in the design variable, the feature points tend to move farther from each other and the motion is directionally better resolved for the larger object. Several other trajectories were used with the third case to see whether the objective function is well-defined for all of the other trajectories.

68

5.5.2 Combined visual measures and resolvability If a feature point is not in focus, its extraction might introduce some noise. An appropriate feature extraction algorithm would tolerate a certain level of being out-of-focus. In the simulation results presented in Section 5.4.4, the focus objective function is always smaller than the maximum of the other two objective functions. This fact implies that the field-of-view measure and the directional motion resolvability measure together cancel the effect of focus for the given camera trajectory. Another point is that a desirable design solution, must satisfy all the objectives at the same time, i.e., all the objective functions must be negative. This requirement would significantly tighten the feasible design space in multi-objective optimization.

5.6 Summary R The simulation results were presented in this chapter. A MATLAB GUI was developed to both

visualize and measure the different objectives prior to performing the optimization. Numerous object models have been developed and the different measures are evaluated for a given camera frame. The simulation results for the directional motion resolvability measure were presented and discussed. It is observed that a higher feature complexity would enhance the behaviour of the objective function. The simulation results on the visual measures of field-of-view and focus are combined with the directional motion resolvability measure to mathematically formulate the functional requirement Fn+1 discussed in Section 2.2.2. The design parameter (Section 2.2.3) is the 3D location of the feature points in the object frame.

69

DIRECTIONAL MOTION RESOLVABILITY

FIELD−OF−VIEW

1

−0.2

0.8

−0.3

0.6 −0.4

0.4

FOV

0.2

g

−0.6

g

DIR

−0.5

0 −0.7

−0.2 −0.8

−0.4

−0.9

−1

−0.6

0

0.5

1

1.5

2

2.5

3

−0.8

3.5

0

0.5

1

1.5

2

2.5

3

3.5

d [mm]

d [mm]

(a)

(b)

FOCUS (DEPTH−OF−FIELD)

Multiple objective functions vs. design parameter

−0.3

1

max(g

0.8

−0.4

,g

,g

FOV DIR

)

FOC

FOC

0.6

, and g

FOV

−0.6

−0.7

DIR

,g

g

FOC

−0.5

min−max

0.2

0

−0.2

g

FOC

−0.4

g

−0.8

0.4

g

FOV

−0.6 −0.9

gDIR

−0.8 −1

0

0.5

1

1.5

2

2.5

3

3.5

d [mm]

−1

0

0.5

1

1.5

2

2.5

3

3.5

d [mm]

(c)

(d)

Figure 5.11: Normalized objective functions vs. the design parameter d, (a) directional motion resolvability measure, (b) field-of-view measure, (c) focus (depth-of-field) measure, and (d) min-max solution of the multi-objective functions.

70

Chapter 6 Experimental Results 6.1 Introduction The experimental results consist of four main parts: camera calibration, image processing and feature selection, pose estimation , and validation of the simulation results for different objects and different trajectories. The camera calibration results are presented in Appendix A. A detailed version of the camera calibration and a complete description of the image processing and feature selection can be found in [25]. Window-based binary image processing is used for fast extraction and tracking of features. The details are not reproduced in this thesis. It is important to note that fast binary image processing techniques are very sensitive to lighting. Most industrial objects are built with shiny metal surfaces that could reflect the light directly onto the sensor plane, saturating the CCD arrays. The hardware setup is introduced in Section 6.2. The pose estimation is studied in Section 6.3. The nonlinear equations of relative pose, and the formulation of EKF and IEKF are given in Section 6.3.1 and 6.3.2, respectively. Simulation and experimental results that study the sensitivity of the pose estimates to uncertainties in different parameters and the effect of the tuning of the filter parameters are studied in Section 6.3.3. Experimental verification of new designs is left for future work. 71

6.2 Hardware Setup The experimental setup is shown in Fig. 6.1. We have used a hand-in-eye configuration, i.e., the camera was mounted onto the endpoint of the manipulator. For the manipulator, we first used a PUMA 560 robot that has 6 degrees of freedom. The robot went under maintenance and the experiments were continued on a Mitutoyo Configurable Measuring Machine (CMM) with accuracy of 0.001 mm in the 3D Cartesian coordinates. The accuracy of the CMM is higher than what we actually needed. One issue with the CMM is the placement of the camera. It is extremely difficult to place the camera’s view axis, ZC , perpendicular to the horizontal plane of the features, XT −YT , and that generates a deviation in the orientation parameters. The camera was a DALSA CAD1-128A area scan [65] with Rodenstock 3.5 mm lens, which was connected to the processing unit through a Matrox Meteor II-digital frame grabber. The camera generates 128 × 128 8-bit gray-level images and has a rather small field-of-view cone. The object under study was an industrial part used in construction of puppets.

6.3 Pose Estimation Using Kalman Filters As it was previously mentioned, the designed features would be used in vision-guided control of manipulators. The first step in generating the feedback control signal is to find the relative pose of the manipulator to the object. In this section, pose estimation using Extended Kalman Filter (EKF) has been implemented, and an important improvement is made to estimate pose using a modified version of the original filter, namely Iterated EKF (IEKF). We contribute by applying the IEKF for real-time pose estimation within a high-speed visual servoing system and provide a detailed sensitivity analysis of both EKF and IEKF in the presence of uncertainties. The uncertainties

72

Figure 6.1: Hardware Setup

73

include erroneous tuning of Kalman Filter parameters and initial state guess, feature selection, and focal length (camera intrinsic parameter).

6.3.1 Pose Estimation Using Extended Kalman Filter The theory of discrete Kalman Filters and EKF is described in [15]. Suppose that the state vari˙ φ, φ] ˙ ⊤ , where the relative pose variables and their ˙ Y˙ , Z, Z, ˙ ψ, ψ, ˙ θ, θ, able is defined as x = [X , X,Y, corresponding time derivatives are taken into account. The first-order time derivatives (velocities) would be zero for a fixed relative pose. This is also the case for the initial pose estimate that is required to start the EKF-based pose estimation, as it is assumed that the robot starts the process from a stationary situation. The process equation has the form of xk = Axk−1 + ωk ,

(6.1)

where A is diag[ 10 T1 ], and ωk is the process noise with covariance Qk . The measurement equation is nonlinear and is written as zk = G(xk ) + νk ,

(6.2)

where νk is the measurement noise with covariance Rk . For simplicity, we consider only three feature points from which the pose variables are estimated. In reality, five or six feature points must be used to improve the reliability of the pose estimates. The measurements are taken from the image plane and are the coordinates of the feature point expressed in the image plane, zk = [S x1 ,S y1 ,S x2 ,S y2 ,S x3 ,S y3 ]⊤ k . Using a pinhole camera model, the nonlinear function in (6.2) is expressed as follows C X CY C X CY C X CY 1 1 2 2 3 3 , C , C , C , C , C ]. Z1 Z1 Z2 Z2 Z3 Z3

G(xk ) = fC [ C

(6.3)

The Kalman filter is an optimal estimator only for linear systems, in which the process and measurement noise are both white noise. As it is seen in (6.3), the pose estimation problem is nonlinear 74

in nature and hence an optimal solution cannot be reached. However, if the noise is still white, then the nonlinear system might be linearized to reach a sub-optimal solution at the risk of divergence in the pose estimates [3]. It is also assumed that the condition to linearize the measurement function is not violated, while this has not been mathematically proven in [3]. If xk is the state at step k, xˆ k|k−1 is the a priori state estimate at step k given the knowledge of the process at the end of step k − 1, and xˆ k|k is the a posteriori state estimate at step k given measurement zk , the a priori and a posteriori estimate errors are defined [15] as ek = xk − xˆ k|k and ek|k−1 = xk − xˆ k|k−1 . Accordingly, a priori error covariance and a posteriori error covariance are ⊤ defined as Pk|k = E[ek e⊤ k ] and Pk|k−1 = E[ek|k−1 ek|k−1 ].

The nonlinear function in (6.3) can be linearized from the following Jacobian, using the first order Taylor expansion at xˆ k|k−1 : Ck =

∂G(x) |x=ˆxk|k−1 . ∂x

(6.4)

Once the system is linearized, the update rules of the discrete Kalman Filter are used with the hope that the pose estimates converge. This can be mathematically expressed as follows [3]: xˆ k|k−1 = Aˆxk−1|k−1

(6.5)

Pk|k−1 = APk−1|k−1 A⊤ + Qk−1

(6.6)

Sk = Ck Pk|k−1Ck⊤ + Rk

(6.7)

Kk = Pk|k−1Ck⊤Ski

−1

xˆ k|k = xˆ k|k−1 + Kk (zk − G(ˆxk|k−1 )) Pˆk|k = Pˆk|k−1 − KkCk Pk|k−1

(6.8) (6.9) (6.10)

Prediction equations are brought in (6.5) and (6.6). Kalman gain is found in (6.8) after linearization in (6.4). The estimate update equations are brought in (6.9) and (6.10). 75

An assumption is made that the noise covariance matrices are constant with respect to time. This assumption for measurement noise Rk is reported as valid [5]. However, process noise cannot be constant during any visual servoing experiments. The reason is that the relative pose of the camera with respect to the object is changing in a nonlinear fashion, which actually is disturbance and is changed with time. If there is a coarse motion, the disturbance noise changes drastically and if not updated it would result in an unstable pose estimate. Our assumption is that there are no such coarse motions. The adaptive estimation of process noise covariance matrix is fully addressed in [16]. More importantly, the standard EKF method does not consider errors due to linearization, which cannot be ignored. An error in the linearization can lead to wrong estimates and/or divergence of the pose estimates. Since the nonlinearity is only in the measurement equation, IEKF is the best method to deal with the nonlinearity. The linearization errors when using IEKF, tend to be smaller than EKF and other variations of Kalman Filters, because IEKF uses the measurement to linearize [12]. In the following subsection, the equations of IEKF are provided.

6.3.2 Pose Estimation Using Iterated Extended Kalman Filter The IEKF is explained in [13, 14]. It uses the same prediction equations as EKF, namely, (6.5) and (6.6). Then, starting with xˆ 0k = xˆ k|k−1 , i.e., for i = 0, the following equations are iteratively

76

performed for i ≤ N − 1: Cki =

∂G(x) | i ∂x x=ˆxk

(6.11)



Ski = Cki Pk|k−1Cki + Rk ⊤

Kki = Pk|k−1Cki Ski

−1

rki = zk − G(ˆxik ) xˆ i+1 = xˆ ik + Kki rki k

(6.12) (6.13) (6.14) (6.15)

At the end of all iterations, xˆ k|k = xˆ N k . The a posteriori error covariance estimate is updated according to Pˆk|k = Pˆk|k−1 − KkN CkN Pk|k−1

(6.16)

We have found an optimal value for the number of iteration by subjectively minimizing the sum of squared error from the true trajectory. Fig. 6.2 shows the number of iteration versus this error measure and shows that n = 30 is an optimal value. It should be noted that a fixed number of iterations is not necessary. The iteration can be stopped if the iterated state estimate is close to the previous value, i.e., (Kki rki )⊤ (Kki rki ) < τ, where τ is a threshold found from experiments.

6.3.3 Experimental Sensitivity Analysis Performance Measurement The performance of the pose estimation is found from an error measure defined in the Cartesian space. We have used the sum of squared differences of the estimate from its true trajectory. Alternatively, the stability of pose estimates can be found from a measure, ρik , which is defined ⊤

as ρik = rki Ski rki . The smaller this measure, the more convergent are the pose estimates. In IEKF, 0 ρN k should be used, where as in EKF, ρk should be used as there are no iterations.

77

180 IEKF EKF 160

Error Measure

140

120

100

80

60

40

0

10

20 30 Number of IEKF Iterations

40

50

Figure 6.2: Optimal number of IEKF iterations reached at n = 30. The error measure for EKF is brought to compare with that of IEKF.

Process Noise Covariance Matrix, Qk−1 We have used a diagonal process noise covariance matrix in the form of Qk−1 = diag(0, q, 0, q, 0, q, 0, q, 0, q, 0, q) with q ∈ {103 , 10, 10−1, 10−3 , 10−5 , 10−20 } to study the sensitivity to Qk−1 . Figures 6.3 and 6.4 show the pose estimates with the different matrices for EKF and IEKF for a stationary scene, respectively. It is seen that IEKF is less sensitive to the choice of Qk−1 . With IEKF, the implications of tuning the parameter would be relaxed to a great deal. The experiment is performed for 100 images taken from a still camera from a stationary scene just to study the convergence.

78

290 Yaw (deg)

X (mm)

200 150 100

0

50

Pitch (deg)

Y (mm)

−140 0

50

50

100

0

50

100

0

50 Iteration

100

10 0 −10

100

100 Roll (deg)

500 Z (mm)

0

20

−120

450 400

270 260

100

−100

−160

280

0

50 Iteration

95 90 85

100

Figure 6.3: EKF with different Qk−1 . These graph show the sensitivity of EKF to inappropriate selection of Qk−1 .

79

290 Yaw (deg)

X (mm)

180 160 140 120

0

50

Pitch (deg)

Y (mm)

−140 0

50

50

100

0

50

100

0

50 Iteration

100

10 0 −10

100

100 Roll (deg)

460 Z (mm)

0

20

−120

450 440 430

270 260

100

−100

−160

280

0

50 Iteration

95 90 85

100

Figure 6.4: IEKF with different values of Qk−1 . All six graphs are almost overlaid on each other.

80

Measurement Noise Covariance Matrix, Rk The convergence of the EKF pose estimation is also dependent on the choice of Rk , while IEKF is more relaxed in choosing Rk . As the measurement noise is the same for all features and in both directions, a diagonal matrix with equal diagonal elements, r, is chosen. The size of Rk is twice the number of features (there are two measurements for each feature). Figures 6.5 and 6.6 show the convergence of the pose estimates when r ∈ {5 × 10−2 , 10−1 , 1, 10, 102, 103 } for EKF and IEKF, respectively. The experimental conditions are the same as in Section 6.3.3.

280 Yaw (deg)

X (mm)

180 160 140 120

0

50

275 270 265

100

−120 −140 −160

0

50

−10

100

0

50

100

0

50 Iteration

100

95 Roll (deg)

Z (mm)

100

0

460 440 420 400

50

10 Pitch (deg)

Y (mm)

−100

0

0

50 Iteration

90 85

100

Figure 6.5: EKF with different RK . These graph show the sensitivity of EKF to inappropriate selection of Rk .

81

280 Yaw (deg)

X (mm)

180 160 140 120

0

50

275 270 265

100

−120 −140 −160

0

50

0

50

100

0

50 Iteration

100

0

95 Roll (deg)

Z (mm)

100

5

−5

100

460 450 440 430

50

10 Pitch (deg)

Y (mm)

−100

0

0

50 Iteration

90 85

100

Figure 6.6: IEKF with different values of RK . All six graphs are almost overlaid on each other.

82

Sampling Time (Visual Servoing Speed), T Higher visual servoing speed may be attained under IEKF pose estimation. Figures 6.7(a) and 6.7(b) show the position estimates for two different sampling times, 50ms and 10ms respectively. With IEKF, the actual pose is followed more accurately than with EKF. The error measure introduced earlier, quantifies the superiority of IEKF-based pose estimation as depicted in Fig. 6.8. It is seen that for high visual servoing bandwidth, the error is pose estimates increase with a higher slope. 20 Hz

100 Hz

140

140 EKF IEKF Actual

130

130

125

125

120

120

115

115

110

110

105

105

100

0

100

200

300

400

EKF IEKF Actual

135

X (mm)

X (mm)

135

100

500

Iteration

0

100

200

300

400

500

Iteration

(a)

(b)

Figure 6.7: Experiments on visual servoing speed. Only position variable on x direction is depicted. Solid lines represent IEKF, dotted lines represent EKF, and regular line is the actual trajectory for different sampling frequencies; (a) 20 Hz and (b) 100 Hz.

Initial State Estimate, xˆ 0|0 The position displacement can be up to 160mm in each direction for a fast converging IEKF pose estimation (reaching steady state in the first five sampling periods) , while for EKF the position displacement should not exceed 40mm. For orientation, the IEKF tolerates up to 15 degrees of distortion for each angle, while for EKF these values must be smaller then 5 degree. Many experi83

450 400 350

IEKF EKF

Error Measure

300 250 200 150 100 50 0

0

100

200 300 Sampling Frequency in Hz

400

500

Figure 6.8: Sampling frequency (visual servoing speed) vs. error measure.

ments have been performed for the above conclusion, but only the results of The position variables for both methods are shown in Fig. 6.9 for three sets of equal displacements in each coordinate: 40mm, 70mm, and 160mm. The broader choice of initial state is quite desirable for visual servoing applications, where the starting point can be away from the desired pose. Feature Extraction Under IEKF pose estimation, the feature extraction algorithm can tolerate higher uncertainty. Any step errors in the image processing would cause an unstable EKF estimate. Since, IEKF is using the measurements as feedback, it adaptively adjusts the Kalman gain to reduce the error and hence can tolerate a higher level of uncertainty. This has been found by applying step disturbances, as well as applying white noise to the measurements, but the results are not shown here.

84

IEKF

EKF 300 X (mm)

X (mm)

300 200 100

0

5

10

15

Y (mm)

Y (mm)

−100

50

100

0

5

10

15

0

50

100

0

50 Iteration

100

0 −200 −400

20

600 Z (mm)

600 Z (mm)

0

200

0

500 400

100 0

20

100

−200

200

0

5

10 Iteration

15

500 400 300

20

Figure 6.9: Comparison of position estimates for three sets of equal displacements in each coordinate: 40mm, 70mm, and 160mm. Only position variables are shown for both EKF and IEKF.

85

Focal Length, fC The camera calibration always provides the intrinsic parameters such as the focal length and the image center with some degree of uncertainties. We have tested the tolerance of both methods to changes in focal length fC . After camera calibration, fC was found to be 3.700mm. Figures 6.10 and 6.11 show the result of experiments when fC ∈ {3.675, 3.680, 3.685, 3.690, 3.695, 3.700}. It is seen that with slightly different focal lengths, the IEKF results in different depths, which is logical as focal length contributes in the projection equations, however, it is worthy to note that it is only with IEKF that the pose estimates are stable.

350 Yaw (deg)

X (mm)

200 100 0

0

50

250

100

−120 −140 0

50

100

0

50

100

0

50 Iteration

100

0

100 Roll (deg)

Z (mm)

50

20

−20

100

600 500 400 300

0

40 Pitch (deg)

Y (mm)

−100

−160

300

0

50 Iteration

80 60

100

Figure 6.10: EKF with different values of fC . These graph show the sensitivity of EKF to inappropriate selection of fC .

86

290 Yaw (deg)

X (mm)

180 160 140 120

0

50

Pitch (deg)

Y (mm)

−140 0

50

50

100

0

50

100

0

50 Iteration

100

10 0 −10

100

100 Roll (deg)

460 Z (mm)

0

20

−120

440 420

270 260

100

−100

−160

280

0

50 Iteration

95 90 85

100

Figure 6.11: IEKF with different values of fC . All six graphs are almost overlaid on each other, except for estimates in the z-axis.

87

6.3.4 Tuning of Filter Parameters Tuning of Kalman Filter is very important to reach a small and reasonable delay in tracking. The tuning process includes identification of a the measurement and process noise covariance matrices. The measurement noise covariance matrix Rk can be determined by off-line sample measurements as explained in [66]. However, determination of process noise covariance matrix Qk−1 is not straightforward. Furthermore, Qk−1 is not a constant matrix in time and changes as the motion profile changes during the visual servoing process. If Qk−1 is chosen to include enough uncertainty, then acceptable results would be attained [15], or alternatively, Qk−1 might be adaptively adjusted to the motion at the expense of some computational cost [16]. We tune these two covariance matrices in an off-line fashion, i.e., by scanning a reasonable range for diagonal elements of these two matrices (uncorrelated off-diagonal variances are assumed to be zero) and minimizing the error measure introduced in Section 6.3.3. In addition, the same error measure is optimized for a given range of the sampling period T .

6.3.5 Summary In Section 6.3, we investigated the details of using IEKF pose estimation in high speed robotic visual servoing. Through numerous experiments and quantitative comparison of an error index, it was resulted that IEKF is superior to EKF in many concepts, most important of which is the ease in tuning of the Kalman filter as shown in Section 6.3.3. In addition, the proposed approach enables the RVS to operate in higher speeds, given that a fast enough image processing unit is available. The smaller the sampling period in Kalman filter equations is, the less reliable are the velocity estimates between consecutive samples. This imposes a limit on the lower bound of the sampling period, the inverse of which determines the maximum theoretical bandwidth of the visual 88

servoing system in frames per seconds (also referred to as Hz in RVS community). The bottleneck in the state-of-the-art literature to reach a very high-speed RVS platform originates from the image acquisition and robust image processing. The manipulator servo control loop works in the range of a couple of kHz while the pose estimation, as outlined in Section 6.3.3 performs in the range of 20 − 500 Hz (2 − 50ms per frame). In addition, it is seen that inaccurate camera calibration results in divergent pose estimates with EKF and even though the pose estimation is convergent with IEKF, there are still discrepancies in depth estimation. While EKF has a relatively appropriate performance to be used in a position-based visual servoing structure, it is known that the performance is bounded by choice of filter parameters, and external uncertainties. We have shown that using a modified version of EKF, namely, Iterated EKF (IEKF) a better performance for higher speeds can be reached. IEKF suits the cases in which the nonlinearity is only in the measurement equation. The added computational cost in IEKF is not significant, given that the bottleneck in real-time pose estimation in PBVS is the feature extraction module speed.

89

Chapter 7 Conclusions and Future Work 7.1 Conclusions A new theory of the optimal design of features for vision-guided manipulation was presented in this work. The theory was built upon the fundamentals of the axiomatic design theory with a rich blend of machine vision algorithms. The problem was formulated in a multi-objective formalism, where the different objectives were originated from different measures. Visual measures were the field-of-view, focus, and pixel resolution. In the context of the visual measures, the problem was solved as the inverse problem to the vision sensor planning. A new directional motion resolvability measure was also introduced. The optimal design for the new measure guarantees motion resolvability along the camera trajectory. A GUI was developed for ease of simulations. The simulation results were shown for the optimal design for directional motion resolvability measure. In addition, the visual measures were combined to the directional motion resolvability measure and the simulation results were again shown and discussed. The implementation of the theory requires three major tasks: (1) calibrating the camera, (2)

90

performing image processing and pose estimation, and (3) experiments with the redesigned object in the setup. The first two tasks have been completed successfully with an important improvement to the pose estimation methodology. The experiments have been designed and are set forth as future experimental work.

7.2 Future Theoretical Work 7.2.1 Scale-invariant feature points in design for vision-guided manipulation One of our important proposed future works includes continuation of a pilot study performed on selecting a robust set of Scale-Invariant Feature Transform (SIFT) features [29]. The aim of such future research would be to design features based on their invariance in different scales and different views, when a camera is moving along a trajectory. The authors do not yet have a solid result nor a mature theory in this regard, but they believe that a design approach that combines the geometric properties and invariant appearance properties would lead to more robust vision-guided manipulation.

7.2.2 Control stability measures in design for vision-guided manipulation In the presented theory, visual measures were used to satisfy machine vision requirements. In addition to visual measures, image Jacobian matrix was used to develop a directional motion resolvability measure. This measure guarantees image singularities along the camera trajectory when the control strategy utilizes the image Jacobian. To ensure control stability under different control strategies during the task of visual servoing, other stability measures must also be introduced. This could be the topic of interesting, and yet challenging, future research. 91

7.3 Future Experimental Work Future experiments would include redesign of industrial objects in practice. The simulation results should be compared with the experimental results to validate the implementation. A meaningful continuation of the research in this direction would be generalization of the optimization process to include large numbers of design parameters and appropriately dealing with the local minima.

92

References [1] S. Hutchinson, G. D. Hager, and P. I. Corke, “A tutorial on visual servo control,” IEEE Trans. Robotics and Automation, vol. 12, no. 5, pp. 651–670, Oct. 1996. [2] P. I. Corke, Visual Control of Robots: High-performance Visual Servoing.

Somerset, UK:

Research Studies Press, 1996. [3] W. J. Wilson, C. C. W. Hulls, and G. S. Bell, “Relative end-effector control using cartesian position based visual servoing,” IEEE Trans. Robotics and Automation, vol. 12, no. 5, pp. 684–696, October 1996. [4] B. Espiau, F. Chaumette, and P. Rives, “A new approach to visual servoing in robotics,” IEEE Trans. Robotics and Automation, vol. 8, no. 3, pp. 313–326, June 1992. [5] L. Deng, “Comparison of image-based and position-based robot visual servoing methods and improvements,” PhD Thesis, Department of Electrical and Computer Enginieering, University of Waterloo, Waterloo, ON, 2003. [6] F. Janabi-Sharifi, “Visual servoing: Theory and applications,” in Handbook of OptoMechatronic Systems, H. Cho, Ed. Boca Raton, FL: CRC Press, 2003. [7] R. Haralick, H. Joo, C. Lee, X. Zhuang, V. Vaidya, and M. B. Kim, “Pose estimation from 93

corresponding point data,” IEEE Trans. Systems, Man, and Cybernetics, vol. 19, pp. 1426– 1446, 1989. [8] C. Lu, G. Hager, and E. Mjolsness, “Fast and globally convergent pose estimation from video images,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 22, pp. 610–622, 2000. [9] R. Kumar and A. Hanson, “Robust methods for estimating pose and a senesitivity analysis,” CVGIP: Image Understanding, vol. 60, no. 313-342, 1994. [10] J. Wu, R. Rink, T. Caelli, and V. Gourishankar, “Revovery of the 3D location and motion of a rigid object through camera image (an extended kalman filter approach),” Intl. J. Computer Vision, vol. 3, pp. 373–394, 1988. [11] S. J. Julier and J. K. Uhlmann, “Unscented filtering and nonlinear estimation,” Proceeding of IEEE, vol. 92, pp. 401–422, 2004. [12] T. Lefebvre, H. Bruyninckx, and J. D. Schutter, “Kalman filters for nonlinear systems: A comparison of performance,” Intl. J. of Control, vol. 77, no. 7, pp. 639–653, May 2004. [13] Y. Bar-Shalom and X. Li, Estimation and Tracking: Principles, Techniques, and Software. Boston, MA: Artech House, 1993, pp. 625-655. [14] Z. Zhang and O. Faugeras, 3D Dynamic Scene Analysis: A Stereo Based Approach. Berlin, Germany: Springer-Verlag, 1992. [15] G. Welch and G. Bishop, “An introduction to the kalman filter,” Computer Science, UNC Chapel Hill, Tech. Rep. Technical Report TR 95-041, 1995, Revised April 2004.

94

[16] M. Ficocelli and F. Janabi-Sharifi, “Adaptive filtering for pose estimation in visual servoing,” in Proc. IEEE/RSJ Intl. Conf. on Intelligent Robots and Systems (IROS 2001), 2001, pp. 19– 24. [17] C. Tomasi and T. Kanade, “Detection and tracking of point features,” Carnegie Mellon University, Technical Report CMU-CS-91-132, 1991. [18] J. Shi and C. Tomasi, “Good features to track,” in Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR ’94), 21-23 June 1994, pp. 593–600, implementation: http://vision.stanford.edu/ birch/klt/. [19] K. P. Papanikolopoulos N., Nelson B., “Six degree-of-freedom hand/eye visual tracking with uncertain parameters,” IEEE Trans. on Robotics and Automation, vol. 11, no. 5, pp. 725–732, 1995. [20] B. Nelson, P. Papanikolopoulos, and P. Khosla, Visual Servoing: Real-Time Control of Robot Manipulators Based on Visual Sensory Feedback, ser. World Scientific Series in Robotics and Automation, 1993, vol. 7, ch. Visual Servoing for Robotic Assembly, pp. 139–164. [21] B. Vikramaditya, “Micropositioning using active vision techniques,” Master’s thesis, Mechanical Engineering, University of Illinois at Chicago, 1997. [22] J. T. Feddema, C. S. G. Lee, and O. R. Mitchell, “Weighted selection of image features for resolved rate visual feedback control,” IEEE Trans. Robotics and Automation, vol. 1, no. 7, pp. pp. 31–47, 1991. [23] M. Ficocelli, “State estimation manager for visual servoing,” MESc Thesis, Faculty of Graduate Studies, The Universtiy of Western Ontario, London, 2001. 95

[24] W. J. Wilson, C. C. W. Hulls, and F. Janabi-Sharifi, Robust Vision for Vision-Based Control of Motion. Piscataway, NJ: IEEE Press, 2000, ch. Robust image processing and position-based visual servoing, pp. 163–201. [25] A. Shademan, “Windowing-based image processing and pose estimation for position-based visual servoing under MATLAB,” Ryerson University, Robotics and Manufacturing Automation Lab, Toronto, ON, Tech. Rep. TR-2005-05-24-01, May 2005. [26] F. Janabi-Sharifi and W. J. Wilson, “Automatic selection of image features for visual servoing,” IEEE Trans. Robotics and Automation, vol. 13, no. 6, pp. 890–903, Dec. 1997. [27] M. Paulin, “Combining motion planning and visual servoing- new methods for robust visual control of robots,” PhD Dissertation, The Maersk Mc-Kinney Moller Institute for Production Technology University of Southern Denmark, Sept. 2004. [28] F. Janabi-Shaifi and M. Ficocelli, “Formulation of radiometric feasibility measures for feature selection and planning in visual servoing,” IEEE Trans. Systems, Man, and Cybernetics: Part B, vol. 34, no. 2, pp. 978–987, Apr. 2004. [29] A. Shademan and F. Janabi-Sharifi, “Using scale-invariant feature points in visual servoing,” in Poceedings of SPIE, Vol. 5603, OpticsEast 2004: Machine Vision and Its Optomechatronic Applications Conference, ser. Proc. SPIE, 2004, pp. 341–346. [30] G. Boothroyd, Assembly Automation and Product Design. 270 Madison Avenue, New York, New York 10016: Marcel Dekker, Inc., 1992. [31] A. Shademan and F. Janabi-Sharifi, “Feature design for optimum directional motion resolv-

96

ability,” in Proceedings of SPIE, SPIE International Symposium on Optomechatronic Technologies, vol. 6502, Sapporo, Japan, 5-7 December 2005, pp. 605 207–1–605 207–11. [32] ——, “Sensitivity analysis of EKF and iterated EKF pose estimation for position-based visual servoing,” in Proceedings of IEEE Conference on Control and Applications, Toronto, ON, 29-31 August 2005, pp. 755–760. [33] J. J. Shah and M. Mantyla, Parametric and Feature-Based CAD/CAM: Concepts, Techniques, and Applications. 605 Third Avenue, New York, NY 10158-0012: John Wiley & Sons, Inc., 1995. [34] D. G. Lowe, “Distinctive image features from scale-invariant keypoints,” International Journal of Computer Vision, vol. 60, no. 2, pp. 91–110, 2004. [35] N. P. Suh, The Principles of Design, ser. Oxford Series on Advanced Manufacturing.

200

Madison Avenue, New York, New York 10016: Oxford University Press, Inc., 1990. [36] P. Y. Papalambros and D. J. Wilde, Principles of Optimal Design: Modeling and Computation. New York, NY: Cambridge University Press, 1988 reprinted 1993. [37] R. Fletcher, Practical Methods of Optimization. Chichester: Wiley, 1981. [38] R. T. Marler and J. S. Arora, “Survey of multi-objective optimization methods for engineering,” Structural and Multidisciplinary Optimization, vol. 26, no. 6, pp. 369–395, 2004. [39] Optimization Toolbox for Use with MATLAB, Version 3 ed., The MathWorks, September 2005.

97

[40] K. A. Tarabanis, R. Y. Tsai, and P. K. Allen, “The mvp sensor planning system for robotic vision tasks,” IEEE Trans Robotics and Automation, vol. 11, no. 1, pp. 72–85, February 1995. [41] Y. Yu and K. Gupta, “An information theoretical approach to view planning with kinematic and geometric constraints,” in Proc. IEEE Intl. Conf. Robotics and Automation (ICRA ’01), Seoul, Korea, May 21-26 2001, pp. 1948–1953. [42] P. K. C. Wang, “Optimal path planning based on visibility,” J. Optimization Theory and Applications, vol. 117, no. 1, pp. 157–181, April 2003. [43] V. Sequeira, J. G. M. Goncalves, and M. I. Ribeiro, “Active view selection for efficient 3D scene reconstruction,” in Proc. 13th International Conference on Pattern Recognition, vol. 1, 25-29 Aug. 1996, pp. 815–819. [44] V. Sequeira and J. G. M. Goncalves, “3D reality modelling: photo-realistic 3D models of real world scenes,” in Proc. First International Symposium on 3D Data Processing Visualization and Transmission, 19-21 June 2002, pp. 776–783. [45] K. Klein and V. Sequeira, “View planning for the 3D modelling of real world scenes,” in Proc. IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2000), vol. 2, 31 Oct.-5 Nov 2000, pp. 943–948. [46] G. Bostrom, M. Fiocco, A. R. D. Puig, J. G. M. Goncalves, and V. Sequeira, “Acquisition, modeling and rendering of very large urban environments,” in Proc. 2nd International Symposium on 3D Data Processing, Visualization and Transmission (3DPVT), Sept. 2004, pp. 191–198.

98

[47] P. Dias, V. Sequeira, F. Vaz, and J. G. M. Goncalves, “Combining intensity and range images for 3D modelling,” in Proc. International Conference on Image Processing (ICIP 2003), vol. 1, Sept. 2003, pp. I–417–420, 14-17 Sept. [48] P. J. Tremblay and F. P. Ferrie, “The skeptical explorer: A multiple-hypothesis approach to visual modeling and exploration,” Autonomous Robots, vol. 8, pp. 193–201, 2000. [49] Y. Ye and J. K. Tsotsos, “Sensor planning for 3D object search,” Computer Vision and Image Understanding, vol. 73, no. 2, pp. 145–168, February 1999. [50] K. A. Tarabanis, P. K. Allen, , and R. Y. Tsai, “A survey of sensor planning in computer vision,” IEEE Trans Robotics and Automation, vol. 11, no. 1, pp. 86–104, February 1995. [51] W. R. Scott, G. Roth, and J. F. Rivset, “View planning for automated three dimensional object reconstruction and inspection,” ACM computing surveys, vol. 35, no. 1, pp. 64–96, March 2003. [52] C. K. Cowan and P. Kovesi, “Automatic sensor placement from vision task requirements,” IEEE Trans. Pattern Analysis Machine Intelligence, vol. 10, pp. 407–416, May 1988. [53] K. Tarabanis and R. Y. Tsai, “Computing viewpoints that satisfy optical constraints,” in Proc. Computer Vision Pattern Recognition, 1991, pp. 152–158. [54] R. I. Hartley and A. Zisserman, Multiple View Geometry in Computer Vision.

Cambridge

University Press, ISBN: 0521623049, 2000. [55] L. E. Weiss, A. C. Sanderson, and C. P. Neuman, “Dynamic sensor based control of robots

99

with visual feedback,” IEEE Transactions on Robotics and Automation, vol. 3, pp. 404–417, 1987. [56] B. J. Nelson and P. K. Khosla, “The resolvability ellipsoid for sensor based manipulation,” Carnegie Mellon University, Tech. Rep. CMU-RI-TR-93-28, 1993. [57] H. Michel and P. Rives, “Singularities in the determination of the situation of a robot effector from the perspective view of 3 points,” INRIA, Technical Report 1850, 1993. [58] T. Yoshikawa, Robotics Research 2. Cambridge, MA: MIT Press, 1985, ch. Manipulability of Robotic Mechanisms, pp. 439–446. [59] J.-O. Kim and P. K. Khosla, “Dexterity mesures for design and control of manipulators,” in Proc. IEEE/RSJ Intl. Workshop on Intelligent Robots and Systems (IROS ’91), Nov. 1991, pp. 758–763. [60] L. Zlajpah, “Dexterity measures for optimal path control of redundant manipulators,” in Proceedings of 5th Int. Workshop on Robotics in Alpe-Adria-Danube Region, Budapest, Hungary, 1996, pp. 85–90. [61] B. J. Nelson and P. K. Khosla, “Vision resolvability for visually servoed manipulation,” Journal of Robotic Systems, vol. 13, no. 2, pp. 75–93, 1996. [62] ——, “Force and vision resolvability for assimilating disparate sensory feedback,” IEEE Transactions on Robotics and Automation, vol. 12, pp. 714–731, 1996. [63] R. Sharma and S. A. Hutchinson, “Motion perceptibility and its application to active vision-

100

based servo control,” IEEE Transactions on Robotics and Automation, vol. 13, no. 4, pp. 607–617, August 1997. [64] Concept-1: Parts and Tuning for Volkswagen and Audi, Retrieved November 12, 2005, from http://www.concept1.ca/. [65] Camera User’s Manual. Dalstar CA-D1: DS-11-16K7H, DS-12-16K5H, High-Speed Progressive Scan Area Cameras, Dalsa, 29/01/2002, 03-32-00176, Rev. 06. [66] J. Wang, “Optimal estimation of 3D relative position and orientation for robot control,” MASc Thesis, Department of Electrical and Computer Engineering, University of Waterloo, Waterloo, ON, 1991. [67] R. Y. Tsai, “A versatile camera calibration technique for high-accuracy 3D machine vision metrology using off-the-shelf tv cameras and lenses,” IEEE Journal of Robotics and Automation, vol. 3, no. 4, pp. 323–344, Aug. 1987. [68] M. Ficocelli, “Camera calibration, intrinsic parameters,” Robotics and Maufacturing Automation Lab, Ryerson University, Technical Report TR-1999-12-17-01, Dec. 1999. [69] Z. Zhang, “Flexible camera calibration by viewing a plane from unknown orientations,” in International Conference on Computer Vision (ICCV’99), Corfu, Greece, September 1999, pp. 666–673. [70] ——, “A flexible new technique for camera calibration,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 22, no. 11, pp. 1330–1334, Nov. 2000.

101

[71] J.-Y. Bouguet, “Camera calibration toolbox for matlab,” MRL-Intel Corp.,” Online: http: //www.vision.caltech.edu/bouguetj/calib doc/, Oct. 2004. [72] W. J. MacLean, “ECE1772: Motion analysis in computer vision, lecture notes,” University of Toronto, Electrical and Computer Engineering Department, 2004. [73] J. Heikkill¨a and O. Silv´en, “A four-step camera calibration procedure with implicit image correction,” in Proceedings of IEEE Computer Vision and Pattern Recognition Conf., 17-19 June 1997, pp. 1106–1112.

102

Appendix A Camera Calibration A.1 Introduction Computation of the intrinsic and extrinsic camera parameters using a number of known 3D-2D correspondences from the world frame on to the image plane, is referred to as camera calibration. The matrix representing the extrinsic parameters, rotates and translates the world frame on to the camera frame. Intrinsic parameters are ,in fact, the optical and geometric characteristics of the camera, such as the focal length, radial distortion of the lens, and such [67]. The importance of accurate camera calibration lies in the accuracy required in 3D robot vision algorithms. Camera calibration is necessary to find the 2D projections on the image plane from a known 3D point in the world frame. Accurate camera calibration is very important in visionguided control of manipulators. When camera calibration is erroneous, the vision algorithm would fail gracelessly. Our pose estimation methodology and experiments are based upon the pinhole camera model. In this appendix, we briefly describe the pinhole camera model, the distortion models which are due to lens imperfections, and camera matrix projection. Tasi’s standard calibration has been

103

Figure A.1: Illustration of ideal pinhole camera model. A corner feature point in the task frame is projected onto a point in the sensor plane. The task frame and the world frame can be assumed to to be aligned on each other.

previously used to calibrate the camera [68]. The flexible camera calibration method, introduced by Zhengyou Zhang [69, 70], is used in the implementation of the Camera Calibration Toolbox R for MATLAB by Jean-Yves Bouguet [71]. This toolbox has been used to calibrate the DALSA

camera. The results are compared to the previous camera calibration parameters in [25].

A.2 Camera Model In 3D computer vision, usually a pinhole camera model is used. In an ideal pinhole model (Fig. A.1), it is assumed that no lens is available, but rather a very small aperture. This model is widely used in 3D robot vision as it serves as a good model for perspective projection [72]. A camera matrix projects the homogeneous point T P = [T X T Y

TZ

1]⊤ in the object frame

(T P ∈ P 3 ) on to the point S p = [x y z]⊤ , where the homogenous coordinates of the projections on 104

the image plane (S p ∈ P 2 ) are found from S x = x/z and S y = y/z. The camera projection model can be posed as S

p= M

T

P.

(A.1)

The camera matrix M is found from multiplication of the intrinsic and extrinsic camera matrices. The extrinsic camera matrix is found from the camera calibration, but purely from the rotation and translation of the camera frame according to Mext = [R − RT],

(A.2)

where R is the rotation matrix that rotates the world frame onto the camera frame and T is the translation that translates the origin of the world frame onto the origin of the camera frame. The intrinsic camera matrix can also be found from camera calibration and has the general form of   fC /sx 0 Cx fC /sy Cy  , Mint =  0 (A.3) 0 0 1 where fC is the focal length of the camera, sx and sy are the pixel spacing along the x and y coordinates, respectively, and the piercing point is taken as [Cx Cy ]⊤. A rigid body transformation characterized by Mext transforms the points from the object frame to the camera frame according to the following equation: C

P = Mext T P,

(A.4)

where T P is the 3D point expressed in the object frame, C P is the same point expressed in the camera frame, and Mext is the extrinsic camera matrix. The next step is the transformation of the 3D point in camera frame to ideal 2D projection on

105

the image plane using the pinhole model:  C  fC X    C   fC Y  = Mint C P     CZ

(A.5)

Note that the imperfections involved such as image displacement and radial distortion have not yet been considered: S

CX

xu = fC C

(A.6)

Z CY S yu = fC C Z where S xu and S yu are the undistorted coordinates. This is a nonlinear transformation.

Assuming a perspective model in (A.3), where sx = sy = 1 and Cx = Cy = 0 and alignment of the world frame and the camera frame in (A.2), without loss of generality the camera matrix M is found from M = Mint Mext



fC =0 0

   0 0 1 0 0 0 fC fC 0 0 1 0 0 =  0 0 1 0 0 1 0 0

 0 0 0 fC 0 0 . 0 1 0

(A.7)

Most real camera-lens systems cannot be considered ideal as distortions and image displacement occur most often. With a non-ideal pinhole camera model, there are two additional steps to transform a 3D point onto the 2D image plane [67]: 1. Radial lens distortion consideration: S S

xd = S xu − Dx

yd = S yu − Dy , 106

(A.8)

where S xd and S yd are the distorted or true coordinates on the image plane, and Dx and Dy are the radial distortion compensation terms parameterized by distortion coefficient κn as follows: Dx = S xd (κ1 r2 + κ2 r4 + ...) Dy = S yd (κ1 r2 + κ2 r4 + ...) q 2 2 r = S xd + S yd 2. Transformation of the real image coordinate [S xd , S yd ]⊤ to the image coordinate which is found from the grabbed image [ S x f , S y f ]⊤ : S

xf =

Sx d

+Cx , sx Sy d S yf = +Cy , sy

where [Cx ,Cy ]⊤ is the center of image, [sx , sy ]⊤ is the pixel spacing.

A.3 Experiments An extensive comparison of different camera calibration methods is presented in [25]. In that technical report, it is shown that the flexible camera calibration of Zhengyou Zhang [69, 70] (implemented in [71]) has several advantages to other methods. Table A.1 summarizes the different notations used by different authors, presented to avoid confusion. We shall not repeat the content of the technical report, but rather we shall provide the final results and the error analysis of the camera calibration process. Three planar checkerboard patterns were designed to calibrate the camera intrinsic parameters. The main reason to design different patterns was to compare the calibrated parameters and 107

Table A.1: Different Notations for Camera Intrinsic Parameters

Description Focal length [pix]

Bouguet[71] fc(1)

Tsai[67]



fc(2)

f .sx dPx f dPy

Principal Point [pix] — Radial dist. — — Tangential dist. — Skew coefficient

cc(1) cc(2) kc(1) kc(2) kc(5) kc(3) kc(4) alpha c

Cx Cy κ1 f 3 κ2 f 5 κ3 f 7 d1 f 2 d2 f 2 0

Heikkil¨a[73] f .Du .su

Ficocelli’s code[68]

f .Dv

ALP∗FOC N FOC N

u0 v0 k1 f 3 k2 f 5 k3 f 7 p1 f 2 p2 f 2 0

CX CY K1 ∗ FOC3 K2 ∗ FOC5 0 P1 ∗ FOC2 P2 ∗ FOC2 0

integrate the results which had small uncertainties. As there are numerical errors involved in the optimization process, only the results with smaller uncertainties are kept and the rest are discarded. The experiment is made up of the following main steps: (1) capture images of the checkerboard R plane at different orientations, (2) run the Camera Calibration Toolbox from the MATLAB envi-

ronment, (3) load images (4) extract grid corners, (5) estimate the camera parameters, (6) examine uncertainties, (7) remove outliers, and finally, (8) re-estimate the parameters without the outliers. Figure A.2 shows the uncertainty graph. Table A.2 presents the final parameters, and Figure A.3 illustrates the extrinsic camera parameters from the calibration rig.

108

Reprojection error (in pixel) − To exit: right button

0.5

y

0

−0.5

−1

−1.5

−1.5

−1

−0.5

0 x

0.5

1

1.5

Figure A.2: Error analysis plot. Colour coding refers to different images.

Table A.2: Estimated calibration parameters.

Parameter fc(1) fc(2) cc(1) cc(2) kc(1) kc(2) kc(3) kc(4)

Estimated 231.4119 231.2500 64.2 61.5 -1.4487 -1.3175 0.0006845 0.0006845

109

Figure A.3: Illusration of camera extrinsic parameters.

110

Suggest Documents