Robot Vision for Space Applications - University of Alberta Computing ...

6 downloads 0 Views 2MB Size Report
14:30Development of a Scaled Ground Testbed for Lidar-based Pose Estimation. Andrew C.M. Allen, Nikolai N. Mak, Christopher S. Langley, MDA corporation.
IEEE/RSJ IROS 2005 Workshop on

Robot Vision for Space Applications Aug 2, 1:30 PM to 5 PM in Salon 15 Shaw Conference Centre, Edmonton, Alberta, Canada Organizers: Leo Hartman, Martin Jagersand, Piotr Jasiobedzki, Ioannis Rekleitis Proceedings editor: Martin Jagersand

Schedule and Proceedings Table of Contents Time

Talk

Page

13:30

Welcome and introductory lecture on Robot Vision for the Space Shuttle and Space Station Robot Arms 3 Piotr Jasiobedzki, MD Robotics.

14:10

Satellite Pose Acquisition and Tracking with Variable Dimensional Local Shape Descriptors Babak Taati and Michael Greenspan, Queen’s University

4

14:30

Development of a Scaled Ground Testbed for Lidar-based Pose Estimation Andrew C.M. Allen, Nikolai N. Mak, Christopher S. Langley, MDA corporation

10

14:50

Coffee break

15:10

Scalable real-time vision-based SLAM for planetary rovers Robert Sim, Matt Grifin, Alex Shyr, and James J. Little, UBC

16

15:30

Estimation of the aerodynamical parameters of an experimental airship Diego Patino, Leonardo Solaque, Simon Lacroix and Alain Gauthier, LAAS CNRS and Andes University

22

15:50

End-point Control of a Flexible Structure Mounted Manipulator Based on Wavelet Basis Function Networks David X.P. Cheng,Defence Research and Development Canada

29

16:10

ARTag Fiducial Marker System Applied to Vision Based Spacecraft Docking Mark Fiala, National Research Council

35

16:30

End notes and survey of Space Agency activities Leo Hartman and Ioannis Rekleitis, CSA

41

IEEE/RSJ IROS 2005 Workshop on

Robot Vision for Space Applications Introduction The demanding nature of space operations obliges continuous innovation while at the same time requiring high reliability and robustness. This provides a "trial by fire" for any new research, and subsequently after having been space qualified much of this technology makes it into other applications. Of particular interest for the IROS audience are the high demands that space applications put on robotic hardware and software. For instance, weight and power constraints force hardware designers to depart from the rigid and heavy manipulators used in terrestrial manufacturing. In turn these new lightweight designs pose new challenges in controller and software development, operations (e.g., the flexibility in the space shuttle arm), the types of feedback used (e.g., joint feedback is not sufficient for a non-rigid arm, but end-point feedback from visual tracking or other wide area sensing is needed), and how they are employed to solve tasks (e.g., supervisory control that stratifies the human-robot interface, but does not require full robot autonomy). Vision and other wide area sensing technologies are critical to future space robotics applications in order to allow safe interaction with the environment and efficient environment modeling. In addition to the usual challenges of environment modeling, target acquisition and tracking, planning and monitoring, space vision systems have to address additional problems including qualification for operation in space, severe lighting variation, limited onboard computing resources and limited viewpoints and mobility. A space vision system also has to be tightly integrated into other onboard subsystems (robot control system, command and telemetry, onboard networking, real-time infrastructure) in such a way that its behavior is always well defined, that its performance can be monitored and verified and that in the interest of fault tolerance the system of which it is a part can work with it and without it. Due to the mission and safety critical status of space robotic equipment, a space vision system controlling such equipment must be highly reliable. If someone has to monitor it, using it will save little human effort. The workshop will bring together researchers interested in robotic vision methods particularly suitable for space applications. A particular aim for the workshop is to connect academic researchers with applied problems in space industry and institutes. To support this objective we will have participation from Canadian Space Agency (CSA) and MD Robotics (builder of the robotic arms on the space shuttle and international space station). The program was selected from submitted papers and span academic, government and industrial research. Each paper was reviewed by two or three reviewers. We wish to thank the reviewers for their efforts. The paper acceptance rate was 46%. Many of the new technologies will not only improve robotics capabilities in space but promise to also widen the applicability of robotics on earth and take the field beyond its traditional center of gravity in manufacturing into other fields such as exploration, mining and natural resource extraction, emergency response, decontamination, etc. Therefore we expect this workshop to be of interest not only to those who currently work in space applications but also to those who want to stay up to date on recent progress and get an idea of what technologies may be ripe to transfer between space and other application areas. We hope the workshop will be an interesting and rewarding experience for all participants. The organizers: Leo Hartman Martin Jagersand Piotr Jasiobedzki Ioannis Rekleitis

Canadian Space Agency University of Alberta MD Robotics Canadian Space Agency

[email protected] [email protected] [email protected] [email protected] 2

IEEE IROS 2005 Workshop on Robot Vision for Space Applications

Invited Lecture

Robotic Vision for Autonomous Operations in Space Piotr Jasiobedzki MDA Space Missions 9445 Airport Rd., Brampton ON, Canada

Abstract Space mission have involved humans to perform complex tasks requiring perception, decision making, and dexterity. Inherent dangers of space travel, complexity of space systems and paramount safety requirements have been driving up the cost of manned missions. Ground based control via teleoperation offers operator safety, employs remote sensors to provide information for human decision making, and uses remote robots to perform tasks. Communication links with low latency, and high bandwidth and availability are essential for teleoperation. Such links are expensive for Earth orbiting spacecraft and are not at all available for planetary systems. One solution is to increase autonomy of remote agents (servicer spacecraft, planetary rovers, and exploration probes) by providing them with abilities to make their own decisions using data from on-board sensors. Multi-dimensional data such as camera images and three-dimensional measurements from rangefinders offer the most complete representations of observed scenes. Vision systems can be used to process and analyse such data facilitating interpretation and autonomous decision making. Applications of vision systems for Earth orbit focus primarily on autonomous servicing. Vision guided rendezvous of an unmanned servicer spacecraft with a target satellite may start at a distance of several kilometers and conclude with docking under vision system control. Alternatively, the servicer may hoover in proximity of the target and capture the target with a robotic arm controlled in a visual servoing mode. Robotic servicing will rely on vision systems to model workspaces in 3D and to compare them with blueprints, as well as, to guide end-effectors and tools during operations. Planetary applications that require vision technologies may include landing systems that can guide a lander to a safe landing site and allow for precision landing. Surface exploration with rovers requires vision systems to provide data for local and long range navigation, localization and terrain mapping. Virtual presence and mission monitoring require an ability to create photorealistic and three dimensional models of visited sites. Challenges of computer vision for use on-orbit include the high contrast and high dynamic range of scenes caused by the intense sun illumination of foreground objects and the lack of diffuse lighting of background objects, and the complex shapes of space structures and their featureless and reflective surfaces. Highly unstructured and self-similar planetary surfaces pose challenges for vision systems for use in planetary exploration as these systems will have to not only detect local obstacles but also incrementally create terrain maps that can be used for vehicle localization and future terrain traversals. MDA Space Missions has been developing a range of space vision technologies, such as the Spaceborne Scanning Lidar that has been operating since April 2005 on the XSS-11 spacecraft, a vision system that will guide a robotic arm to capture a free floating satellite in the Orbital Express space demonstration, and vision systems for servicing of the Hubble Space Telescope. Several on-going research and development projects address development of vision systems for satellite servicing, localization and terrain mapping for planetary rovers, and environment modeling. Bio Dr Piotr Jasiobedzki leads research and development in vision system for space and terrestrial applications at MDA Space Missions (formerly Spar Space Systems and MD Robotics) in Brampton ON. Before joining MDA in 1995, he worked at the University of Toronto (Canada) and University of Manchester (UK). His research interests include stereo and 3D vision, sensor fusion, biomedical image interpretation, and robot control and navigation. Dr Jasiobedzki has two patents and has published over 40 papers in technical journals and conference proceedings. 3

IEEE IROS 2005 Workshop on Robot Vision for Space Applications

Satellite Pose Acquisition and Tracking with Variable Dimensional Local Shape Descriptors 1

Babak Taati1 Michael Greenspan1,2 Department of Electrical and Computer Engineering, 2 School of Computing Queen’s University, Kingston, Ontario, Canada [email protected] [email protected] Abstract

A set of local shape descriptors for range data are proposed for reliable point matching in pose acquisition and tracking applications. Several local shape properties are extracted from the principal component space of various neighbourhoods and the similarity between histograms of these properties is used for hypothesizing point matches. The dimensionality of the descriptors may vary, and the effectiveness of several 1D through 9D histograms are experimentally analysed in tracking a range data sequence of a satellite model. The best performing descriptors are used for pose determination between pairs of range images and are shown to outperform two well-known existing descriptors. keywords: 3D pose acquisition, tracking, local shape descriptors, principal component analysis, range data

1

Introduction

Establishing feature correspondences reliably is necessary for efficient pose acquisition and tracking. Local Shape Descriptors (LSDs) have been utilised by researchers for encapsulating the local geometry information of various neighbourhoods and their similarity is used as an indicator of the similarity between the corresponding geometries. Point matches are then used for registering range image sequences for tracking or for registering a range image with a complete surface model for pose determination. Ideally, descriptors should be computationally and memory efficient, discriminating, and robust with respect to a variety of inevitable conditions such as noise and self-occlusion. They should also be robust with respect to the viewing angle. In the Spin Images technique [8], 2D histograms are formed based on the distances of the neighbouring points from the normal vector and the tangent plane. The Point Signatures method [3] constructs 1D arrays based on the distance profile of intersection of a sphere

4

with the object from the tangent plane. Other local shape descriptors developed for pose estimation include Surface Signatures [12], Pairwise Geometric Histograms [1], Statistical Matrices [11], and Point Fingerprints [9]. A review of some of these techniques is offered in [2]. The abovementioned methods take a minimalist approach, in that they try to construct low dimensional and compact descriptors, presumably to favour efficiency. We propose that using high dimensional descriptors could be the key to more reliable and robust point matching without sacrificing computational and memory efficiency. To this end, we have developed a large variety of variable dimensional shape descriptors, based on local properties derived from principal component analysis. These descriptors can be considered as a generalization of well-known LSDs such as Spin Images and Point Signatures and our experimental results confirm their effectiveness in comparison to the existing methods. The main motivation for our research is developing a pose acquisition technique for tracking a satellite based on LIDAR data as part of a larger system that could enable automatic rendezvous and robotic capture of malfunctioning satellites by an unmanned spacecraft for service and redeployment [7, 6]. Other applications for a 3D pose estimation system include 3D model building, industrial inspection, augmented reality, and computer aided surgery.

2

PCA-Based Descriptors

We construct our shape descriptors based on properties extracted from the principal component space of the neighbourhood of each point in a point cloud. Eigenvalue decomposition of the covariance matrix of each neighbourhood is used to associate an orthonormal frame (~i, ~j, ~k) and three eigenvalue scalars (e1 , e2 , e3 ), representing vector lengths along each axis of the frame, to each point. Using these vectors and

IEEE IROS 2005 Workshop on Robot Vision for Space Applications

Position Property x, y, z Xa , Ya , Za

dist

Description coordinates along main axes axes distances p Xa = √ y 2 + z 2 Ya = pz 2 + x2 Za = x2 + y 2 reference p point distance d = x2 + y 2 + z 2

Direction Property

Range [−R, R]

Range

Cθ , Cφ , Cψ

Description inner products of axes Cθ = ~i · ~i0 Cφ = ~j · j~0 Cψ = ~k · k~0

roll , pitch , yaw α, β, γ

ZYX Euler angles ZYZ Euler angles

[−π, π] [−π, π]

[0, R]

[0, R]

[−1, 1]

Table 2: Basic and Extended Direction Properties Table 1: Basic and Extended Position Properties scalars we generate nine basic and several extended properties for each point in the neighbourhood and can form a variety of histograms that carry various levels of information about the local geometry. The orthonormal frame of each point can be treated as a Cartesian coordinate frame known as the principal component space. While constructing a descriptor for a point, we refer to it as the reference point (p’) and to its frame as the reference frame. For each neighbouring point (p), the basic properties define its relationship to the reference point and consist of three position scalars, three direction scalars, and three dispersion scalars. The coordinates of each neighbouring point expressed in the reference frame, x, y, and z along ~i0 , the major, j~0 the semi-major, and k~0 , the minor axes respectively, form the three basic position properties. Several extended position properties can be calculated based on these coordinates. Table 1 lists some possible extended position properties, the nomenclature we use for referring to them, and the range of possible values for each property, where R denotes the neighbourhood radius. The rotation that aligns the orthonormal frame of a neighbouring point with the reference frame can be defined with three parameters. We chose the inner products of the corresponding axes between the two frames as the three basic direction properties. The rotation can be represented in various forms and therefore it is possible to construct several extended properties. Some of these extended direction properties are listed in Table 2, where Ci is the shorthand notation for cos(i). Eigenvalues of the neighbourhood covariance matrix form the three basic dispersion scalars. Three scaleindependent extended dispersion properties are generated by normalizing the basic values by their corresponding dispersion property of the reference point. Table 3 lists the dispersion properties, where e´i refers to a basic dispersion property of the reference point. Figure 1 shows a satellite range image containing nearly 50, 000 3D points. The neighbourhoods of two

5

Dispersion Property e1 , e2 , e3

Description eigenvalues |e1 | > |e2 | > |e3 | eˆ1 = e1 /´ e1 eˆ2 = e2 /´ e2 eˆ3 = e3 /´ e3

eˆ1 , eˆ2 , eˆ3

Range [−R2 , R2 ]

[−∞, ∞]

Table 3: Basic and Extended Dispersion Properties →



j



i’

p

p’

i



k



k’



j’

Figure 1: Two points on a satellite point cloud, their neighbourhoods, and their reference frames

selected points (p and p’) are shown in a darker shade. The local frames of the two points are illustrated and are labeled with (~i, ~j, ~k) and (~i0 , j~0 , k~0 ). Values of the nine basic properties for point p are listed in Table 4 with respect to point p’ as the reference point. For each reference point, a histogram could be constructed based on one or any combination of the properties of neighbouring points. We note that only nine independent properties exist for each neighbour point and therefore we limit our histograms to have one through nine dimensions, with each dimension being one of the 22 properties listed in Tables 1, 2, and 3. Histograms are normalized to the number of neighbouring points that contribute to their construction so

IEEE IROS 2005 Workshop on Robot Vision for Space Applications

position (mm)

dispersion (mm2 )

direction

x

y

z







e1

e1

e1

-356

-265

16

-0.53

-0.46

0.90

4652

3429

118

Table 4: Basic properties of point p in Figure 1 with respect to point p’ as the reference point

the descriptors are robust to sampling resolution. The neighbourhood radii for performing the PCA and for selecting points that contribute to each histogram do not necessarily need to be the same, but were set to the same value in the experiments. Histogram intersection [10] is used for measuring the disparity between histograms and provides a continuous similarity measure in the [0, 1] interval, where 1 denotes identical histograms. Since the range for the scale-independent dispersion properties is unlimited, we clip the values beyond a predetermined maximum before constructing a histogram based on them. We have also chosen to clip the values of the basic dispersion properties beyond a certain range since the great majority of their values lie within a range much smaller than their absolute possible maximum. In the experiments, ranges were   the  clipping   2 empirically set to 0, R2 /2 , 0, R2 /4 , and 0, R  1 /8 for e1 , e2 , and e3 properties respectively, and to 2 , 2 for the extended dispersion properties.

3

Experiments

A set of experiments were performed to analyse the point matching capabilities of the PCA-based descriptors and compare their performance with that of Spin Images and Point Signatures. A one-fifth scale model of the RADARSAT satellite and an Optech ILRIS 3D LIDAR sensor were mounted on two manipulator robots so that the LIDAR could scan the satellite from various orientations and distances while the robotic arms moved to simulate the motion of a spacecraft approaching the satellite. Movement of the arms was slow relative to the data acquisition rate so the motion skew was negligible. Since the configuration of both manipulators was known during the scanning period, a ground truth rigid transformation existed between various scans. A VRML model of the RADARSAT satellite is shown in Figure 2(a). Two range images from the sequence were selected for the first point matching experiments in order to select a small subset of PCA-based descriptors. The disparity in the viewing angle for the two range images was 10◦ in the roll angle and 10◦ in the yaw angle. One hundred points were randomly selected on the

6

Figure 2: (a) RADARSAT VRML model (b) Corresponding points from the second image for the 100 randomly selected points on the first image

first point cloud such that they were also visible on the second point cloud. Based on the ground truth transformation, the corresponding point for each of these random points on the second scan were identified. The second point cloud and the selected points on it are illustrated in Figure 2(b). Figure 3 illustrates a sample confusion matrix generated for the hundred points based on 6D [e2 , e3 , Xa , e1 , Za , γ] histograms of size 46 . This confusion matrix is a 100×100 matrix in which the rows and columns represent the selected points on the first and second point clouds respectively. In the illustration, higher similarities are shown in darker shades and it can be seen that the main diagonal elements of the matrix are mostly darker than the off diagonal elements. Each diagonal element that is larger than all the other element in its row (or column), represents a point that is correctly matched to its true match on the other range image. In this case, there were 73 points that were correctly identified. Since we use RANSAC [4] for generating statistically stable poses from candidate correspondences, it is not strictly necessary for the points to be matched exactly with their true match as long as their true match gains a high rank based on the similarity between their descriptors. We define Ranki as the number of true matches that were ranked within the top-i highest similarities. Rank3 was used as a performance measure in the experiments. The results showed that the relationship between Rank3 and Rank1 (i.e. number of correct matches) was roughly linear. The bounding box for the satellite model is approximately 2.1m × 0.7m × 0.5m. The sampling resolution is such that range images of the tracking sequence each

IEEE IROS 2005 Workshop on Robot Vision for Space Applications

Dim. 1D 2D 3D 4D 5D 6D

Selected Histograms [e1 ], [e2 ] [ˆ e2 , e3 ], [e2 , e3 ] [ˆ e2 , e3 , yaw ], [ˆ e2 , e3 , eˆ1 ] [e2 , e3 , Xa , e1 ], [e2 , e3 , Za , e1 ] [e2 , e3 , Xa , e1 , Za ], [e2 , e3 , Xa , e1 , dist ] [e2 , e3 , Xa , e1 , Za , γ] [e2 , e3 , Xa , e1 , dist , γ] [e2 , e3 , Xa , e1 , Za , Ya , z] [e2 , e3 , Xa , e1 , Za , Ya , dist ] [e2 , e3 , Xa , e1 , Za , Ya , z, dist ] [e2 , e3 , Xa , e1 , Za , eˆ3 , eˆ1 , pitch ] [e2 , e3 , Xa , e1 , Za , eˆ3 , eˆ1 , α, eˆ2 ] [e2 , e3 , Xa , e1 , Za , eˆ3 , eˆ1 , pitch , α]

7D 8D

Figure 3: A sample confusion matrix based on 6D [e2 , e3 , Xa , e1 , Za , γ] histograms

9D

1

C122 + C222 + . . . + C922 = 1, 097, 789

7

Table 5: Top-2 selected histograms at each dimension

# points

80 60

Rank3 max Rank3 avg Rank1 max Rank1 avg

40 20 1

2

3

4

5 (a)

6

2

3

4

5 (b)

6

Dim.

7

8

9

7

8

9

80

# points

contain about 40k-70k 3D points (47, 884 and 56, 743 for the two selected images). The neighbourhood radius was empirically set to 15cm. The number of bins for histograms along each dimension were set to 100, 35, 11, 6, 5, 4, 3, 3, and 3 for 1D through 9D histograms respectively. Based on the 22 properties listed in Tables 1 to 3, over a million possible histograms could be constructed for each point1 . A forward selection scheme was utilised for obtaining a suitable subset of descriptors from the vast pool of possible PCA-based histograms. First 22 1D histograms based on properties listed in Tables 1, 2, and 3 were tested and the top 8 descriptors were selected based on the Rank3 measure. The top 8 1D descriptors were then combined with all the 22 properties to form several 2D histograms and the 8 best performing 2D histograms were identified. Similarly, the top 8 histograms of higher dimensions were selected and each in turn were used to construct 8×22 higher dimensional histograms. In the end, we identified 8 top-performing descriptors for each dimensionality based on their point matching capabilities. Table 5 lists the 2 top performing histograms for each dimensionality. The basic dispersion properties seem to provide the highest discrimination power among the one dimensional histograms. Figure 4(a) illustrates the Rank3 and Rank1 measures for the top performing descriptor and the average of these measures for the top 8 descriptors at each dimensionality. The Rank1 and Rank3 measure are larger than 50% for 70% respectively for all 2D and higher dimensional descriptors. The measures are larger than 60% and 80% respectively for 4D and higher dimensional descriptors. It can also be observed that higher dimensional descriptors (≥ 4D) provided better point matching than the lower dimensional ones despite the fact that they were more coarsely sampled.

60 40 20 1

Dim.

Figure 4: Rank3 and Rank1 measures for the first (a) and the second (b) pair of satellite range images A similar experiment was performed on a second pair of range images to confirm the discrimination power of the selected descriptors. One of the range images from the first pair was replaced with another image from the scanning sequence such that the new pair had a larger view point disparity with 20◦ and 10◦ differences in roll and yaw angles respectively. Figure 4(b) plots the Rank3 and Rank1 measures for the previously selected top performing descriptors applied for point matching across the second pair. As expected, the numbers are smaller due to the larger disparity between the view points but we notice that Rank3 is still over 50% for all 2D and higher dimensional descriptors. We also observe that the 6D histograms provide the best point matching capability. Our final experiment was point matching between

IEEE IROS 2005 Workshop on Robot Vision for Space Applications

35 30

# points

25

40 20

Rank1 Rank3 Rank10

0 50

20 15

0 10

10

5

5 0

0 10

1D 2D 3D 4D 5D 6D 7D 8D 9D S1 S2 P2 P2

Figure 5: Rank3 , Rank1 , and Rank10 for matching sparse scene points to dense model points the second pair of satellite range images for pose estimation. Descriptors were constructed for 2% of the points on one image and for 0.1% of the points on the second image. Due to memory limitations, the number of bins along each direction for 9D histograms was set to 2 in this experiment. Since the LIDAR images are dense, the 2% subsampling provides a rather large set of points and this experiment is equivalent with constructing descriptors for a dense subset of model points offline and for a sparse set of scene points online. Therefore, in this experiment we refer to the densly subsampled image as the model and to the sparsely sampled image as the scene. The subsampling rates lead to almost 1, 000 descriptors on the model and 57 descriptors on the scene. Scene descriptors were matched to model descriptors based on the selected PCA-based descriptors, as well as Spin Images and Point Signatures. Figure 5 shows the Rank1 , Rank3 , and Rank10 measures for the PCAbased descriptor at each dimensionality, for Spin Images of size 35 × 35 (S1) and size 50 × 50 (S2), and for Point Signatures of length 24 (P1) and length 100 (P2). The parameters for Spin Images and Point Signatures were set according to the suggested values in [8] and [3]. The sphere radius for Point Signatures was set to 10cm. The neighbourhood radius for constructing Spin Images was set to 15cm and the neighbourhood radius of 4cm was used for estimating the surface normal directions. We observe that even the low dimensional PCA-based descriptors outperform both Spin Images and Point Signatures. To further illustrate the point matching qualities of our PCA-based histograms against Spin Images and Point Signatures, Figure 6 shows rank histograms of the top PCA-based 1D and 6D histograms, 35 × 35 Spin Images, and Point Signatures of length 24. The rank histograms contain 20 bins, each representing 50

8

5 0 0

(a) PCA−1D

(b) PCA−6D

(c) Spin Image 35×35

(d) Point Signature 100

200

400

600

800

1000

Figure 6: Rank histograms points. The first column therefore, represents Rank50 or the number of scene points (out of 57) for which their true match on the model was ranked between 1 and 50 based on their histogram similarity. The second column represents Rank51−100 and so on. We observe that the first column contains a significantly larger number of points for the PCA-based descriptors than it does for Spin Images and Point Signatures. Our implementations of Spin Images and Point Signatures were slightly simplified version of the original methods. The simplifications in Spin Images include not compressing images and using histogram intersection rather than the image matching technique presented in [8]. Neither simplification should affect the point matching performance of Spin Images since image compression is performed solely for the purpose of gaining memory efficiency, and the suggested image matching in [8] is devised for dealing with outliers and no outliers exist in our experimental data. We use simple histogram intersection for comparing Point Signatures instead of using the shift and compare strategy devised in [3]. This affects the matching quality for signatures with multiple global maxima.

4

Conclusion

We have developed a large set of variable dimensional local shape descriptors for 3D point matching across range images. The descriptors are based on nine basic and several extended properties extracted from the principal component analysis of each point’s neighbourhood. The extended properties are different expressions of the same information as the basic properties. Nevertheless, different representations of the same

IEEE IROS 2005 Workshop on Robot Vision for Space Applications

information could lead to histograms that differ in their point matching capacity and experimenting with various extended properties is therefore a useful exercise. Unlike the minimalistic approach of the previously developed descriptors, our descriptors include variable dimensional histograms of up to 9 dimensions. Previous LSD approaches such as Spin Images obtain view independence by exploiting properties such as distance from the tangent plane or distance from the normal vector. They form a low dimensional array that carries some information about the distribution of these properties for some neighbourhood around each point. View independent properties used in these methods are a subset of our PCA-based properties. For instance, when a small radius is used for neighbourhood generation, the xy plane approximates the tangent plane (possibly with an offset along the z axis) and the distance from the normal vector and the distance from the tangent plane are equivalent to the Za and z position properties respectively. Therefore, our PCA-based descriptors encompass well-known existing descriptors such as Spin Images as well as a large variety of novel descriptors. We have conjectured that since the existing descriptors were selected in an ad hoc manner, they did not necessarily contain the optimal subset of information from the vast pool of available properties and we could empirically determine a more optimal subset. Our experimental results confirmed that our descriptors indeed provided more reliable point correspondences than Point Signatures and Spin Images. While our current implementation is in the Matlab environment and does not provide meaningful results regarding the processing times, we note that the amount of computation required for constructing the PCA-based descriptors is about the same level as that of Spin Images or Point Signatures. Furthermore, the increased reliability of the point matches provided by the PCA-based descriptors could drastically reduce the RANSAC phase of a pose estimation routine and could lead to more efficient and reliable pose estimation for vision based tracking. Our future work involves performing further experiments in registering various models with their range images. We also intend to utilize nonparametric statistical methods [5] for efficiently comparing the high dimensional descriptors in order to deal with the curse of dimensionality.

Acknowledgments The authors would like to thank MDA Space Missions and NSERC for their support for this work.

9

References [1] A. Ashbrook, R. B. Fisher, C. Robertson, and N. Werghi. Finding surface correspondence for object recognition and registration using pairwise geometric histograms. Proceedings of the 5th European Conferene on Computer Vision, pages 674–680, 1998. [2] R. Campbell and P. Flynn. A survey of free-form object representation and recognition techniques. Computer Vision and Image Understanding, pages 166– 210, 2001. [3] C. S. Chua and R. Jarvis. Point signatures: a new representation for 3d object recognition. International Journal of Computer Vision, Vol. 25, No. 1, pages 63– 85, 1997. [4] M. A. Fischler and R. C. Bolles. Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography. Communications of the ACM, Vol 24, pages 381–395, 1981. [5] I. Fraser and M. Greenspan. Color indexing by nonparametric statistics. International Conference on Image Analysis and Recognition, 2005. [6] M. Greenspan, L. Shang, and P. Jasiobedzki. Efficient tracking with the bounded hough transform. Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition, pages 520–527, 2004. [7] P. Jasiobedski., M. Greenspan, and G. Roth. Pose determination and tracking for autonomous satellite capture. The 6th International Symposium on Artificial Intelligence, Robotics and Automation in Space, June 2001. [8] A. E. Johnson and M. Hebert. Using spin images for efficient object recognition in cluttered 3d scenes. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 21, No. 3, pages 433–449, 1999. [9] Y. Sun, J. A. Koschan, D. L. Page, and M. A. Abidi. Point fingerprints: A new 3-d object representation scheme. IEEE Transactions on Systems, Man, and Cybernetics, Part B, Vol. 33, No. 4, pages 712–717, 2003. [10] M. J. Swain and D. H. Ballard. Color indexing. International Journal of Computer Vision, pages 11–13, 1991. [11] G. Xiao, S. H. Ong, and K. W. C. Foong. Threedimensional registration and recognition method for tooth crown. International Congress on Biological and Medical Engineering - The Bio-Era: New Frontiers, New Challenges, 2002. [12] S. M. Yamany and A. A. Farag. Surface signatures: An orientation independent free-form surface representation scheme for the purpose of objects registration and matching. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 24, No. 8, pages 1105–1120, 2002.

IEEE IROS 2005 Workshop on Robot Vision for Space Applications

Development of a Scaled Ground Testbed for Lidar-based Pose Estimation Andrew C.M. Allen, Nikolai N. Mak, Christopher S. Langley MDA, 9445 Airport Road, Brampton, Ontario, Canada {andrew.allen, nikolai.mak, chris.langley}@mdacorporation.com functions on a second craft. One of the key technical challenges to developing any feasible satellite servicing system is the ability to execute rendezvous and docking manoeuvres either automatically (with robotic navigation and guidance) or with a human pilot. This operation, shown in Figure 1, requires a chaser spacecraft (i.e., the servicer) to successfully approach and form a mechanical interface with a target spacecraft (the satellite to be serviced). Communication latencies can make real-time ground control of the chaser spacecraft impossible. A level of supervised autonomy is certainly possible, whereby ground controllers monitor the rendezvous status and send high level commands; but high-bandwidth control of the chaser’s trajectory usually must be performed by the on-board avionics. Lidars have been proposed as a useful sensor for an autonomous rendezvous GN&C system. Lidar has two advantages over camera-based vision systems: they provide 3-D data at both long and short ranges, and are much less sensitive to ambient illumination conditions. Lidars typically fulfil the requirements for range and bearing to a target. At long range, each valid lidar pulse returns the range and bearing of the target. By taking the centroid of the lidar point cloud data, range and bearing can be derived. At medium and close range, a 3-D image of the target can be formed for further estimation of relative attitude and corresponding rates. However, the resolution of the sensor is such that pose estimation

Abstract Lidar (Light Detection and Ranging) sensors have been used successfully to provide range and bearing for onorbit rendezvous of spacecraft from far to medium range. At close range lidar is also capable of producing multiple returns from a single target which permits three-dimensional imaging. Lidar-based pose estimation using sparse 3-D data is a promising enhancement of the sensor for orbital rendezvous. Unfortunately, given the ranges over which a lidar system incorporating pose estimation should be tested, development of a representative ground testbed is not straightforward. A satellite model and laboratory setup have been developed to predict rendezvous lidar sensor performance during initial pose acquisition, approach and capture phases. A technique is presented for geometric and temporal scaling of the experimental setup. Results from the laboratory tests are used to validate a high-fidelity simulation of the rendezvous system. Following simulator validation, analysis and numerical simulation are used to predict the on-orbit performance of the flight system.

1. Introduction On-orbit satellite servicing is a concept for costeffective preservation of satellites, whereby a servicer spacecraft docks and performs resupply or maintenance

Figure 1 – Terminal rendezvous

10

IEEE IROS 2005 Workshop on Robot Vision for Space Applications

based pose estimation and the issues associated with scaled laboratory testing. Experimental results and discussion are given in Section 4, and conclusions follow in Section 5.

(i.e., position and orientation) of the target spacecraft can be determined only from the close-range point cloud data. Figure 1 shows the approximate ranges where range/bearing data and pose estimates are required. In pulsed detection lidar rangefinding systems, short light pulses emitted by a laser source are directed towards a target and the light pulses scattered from the target are detected by an optical receiver. The time-of-flight is then measured by the associated measurement optics and electronics. The main functional blocks of this type of lidar-based rangefinding system are shown in Figure 3. In the Rendezvous Lidar System (RLS), the time-of-flight lidar that MDA has built for use in orbital rendezvous, the laser pulses are steered using a high-precision mirror to permit scanning of a single target; this enables the generation of 3-D range maps of the target vehicle.

2. Background A survey of the literature on Automatic/Autonomous Rendezvous yields a variety of papers, books, and documents from the Russian, American, European, and Japanese space programs. These include definitive references and compendiums such as Automated Rendezvous and Docking of Spacecraft [3] and An Assessment of the Technology of Automated Rendezvous and Capture in Space [12]. Much of the algorithmic treatment of rendezvous is based upon the seminal work of Clohessy and Wiltshire [2] which relies on the earlier lunar orbit studies of Hill [6]. Space-based pose estimation systems include the Space Vision System (SVS) [10], the Video Guidance Sensor (VGS) [7], and its enhanced version [4]. It is worth noting that these predecessor systems use visual targets to determine pose. The RLS is a targetless system, meaning it can be applied to target spacecraft which have not been designed to cooperate with the sensor. Table 1 – Target performance measures

Figure 2 – Rendezvous Lidar System (RLS)

Medium Range (~100 m)

Close Range (~20 m)

± 100 cm

± 30 cm

Orientation Accuracy

± 10 º

±3º

Update Rate

1 Hz

5 Hz

Measurement Position Accuracy

3. Ground Testbed Development 3.1 Objectives Our objective is to perform a ground demonstration of lidar-based pose acquisition and tracking for a representative target spacecraft with representative relative motion. The value of a ground test bed is in the lidar sensor performance evaluation and prediction for all phases of autonomous rendezvous and docking. The design case is a rendezvous with the Hubble Space Telescope (HST, Figure 4). The HST relative velocities considered are as high as 8 cm/s with angular rates up to 0.3 º/s. Target performance measures for the system are given in Table 1, and are appropriate for the requirements of a rendezvous mission. In determining the requirements of the ground testbed, several issues present themselves. Naturally, experiments cannot be performed using the full-size objects (~5 metres) and full-scale distances (~200 metres) at a reasonable cost. The use of scale modelling is appealing because 6 degree-

Figure 3 – Block diagram of lidar The mission requirement is to determine initial pose and track the relative pose changes of the target with sufficient accuracy and frequency to allow the chaser GN&C system to perform the rendezvous and docking. Ground testing of lidar-based pose estimation systems is not straightforward due to issues of geometric and temporal scaling. This paper describes our approach in dealing with these complex issues. Section 2 gives some background information on rendezvous and spaceborne pose estimation. Section 3 outlines our approach to lidar-

11

IEEE IROS 2005 Workshop on Robot Vision for Space Applications

laboratory test with the scaled Ilris is expected to have a much coarser angular resolution than that of the flight sensor.

of-freedom (DoF) trajectories can be readily actualized using manipulators with scale models of spacecraft. The question then becomes, how does one apply scaled laboratory results to predict the actual on-orbit performance of a satellite rendezvous system? As the following sections demonstrate, the answer is not as straightforward as one might think.

Figure 5 – AMTF curves for RLS (blue) and scaled Ilris (red) at 30 metres range Figure 4 – HST frame of reference

3.3 Temporal Scaling 3.2 Geometric Scaling Unlike stereocameras which acquire a single frame of data via simultaneous measurement of pixels in two camera images, a scanning lidar acquires point data sequentially. Skewing of point cloud data due to object motion during a scan can have a significant effect on the accuracy of the pose estimation technique. Because of this, it is important that the temporal motion of the object be scaled against the scan time of the sensor. The reduced size of the test object changes the frame rate of the sensor, which also causes a time scale variation. The time scale variation can be induced by the difference in the frame rate between the laboratory sensor and the flight sensor. In the prototype experiments discussed below, a terrestrial Optech Ilris-3D was used in lieu of the RLS optical head; the Ilris is intended for terrestrial metrology and has been designed with a slower scan speed. If the time to acquire a single frame in the lab is m times longer than the expected flight scan time, the relative motion rates (both linear and angular) of the target object must be decreased by a factor of m.

Scaling of linear measurements is determined by the scale of the satellite mockup. If the target object is scale 1:k, then all distances under consideration are divided by k as well. For example, in the 1:10 scale setup shown in Figure 7, a desired linear velocity of 40 mm/s is executed as 4 mm/s in the laboratory. Angles are scale-invariant, so orientations are kept the same. Angular velocities are likewise unaffected by geometric scaling. Many ground tests would perform the previous scaling and stop there. However, the crucial and unavoidable issue is the noise inherent in the sensor. Although the size of the target object is decreased, it is still being imaged by a sensor with the same amount of noise. Put another way, this is equivalent to scanning a full scale object with k times the sensor noise. Even though the sensor noise cannot be changed, it can at least be quantified. Range and bearing noise can be measured by scanning known targets. The bearing spatial sampling (or image pixelation) and laser spot size effects can be combined into a single metric, called the effective instantaneous field of view (EIFOV) [9]. The EIFOV is derived from an average modulation transfer function (AMTF) which is a function of the spot size and spot spacing of the lidar beam. This function quantifies the spatial frequency response of the lidar, in much the same way that the modulation transfer function describes the spatial response of a camera. Because the spot size typically varies with range, the EIFOV must be calculated based on the actual distance in the lab and then analyzed with respect to the EIFOV in the real scale geometry. Ideally, the EIFOV of the lidar in the lab and in the real range should be equivalent in order to provide the same angular resolution. As can be seen in Figure 5, the

Figure 6 – Side-by-side scaling comparison

12

IEEE IROS 2005 Workshop on Robot Vision for Space Applications

simplified the fabrication and verification of the mockup, but allowed the creation of a precisely matching digital model based on the as-built dimensions. Having a precise model is beneficial because we are interested in seeing how well the system works with real-world lidar measurements and real-world processing restrictions, rather than investigating modelling inconsistencies. Ground truth is established independently using a total station which has been rigidly installed in the laboratory. The lidar sensor is localized by comparing scans of fixed targets with measurements from the total station. The frame of resolution of the target robot (i.e., the frame of reference of the mockup) can be found by measuring known points on the surface of the mockup. Together with the robot’s position telemetry, this measurement can be used to localize the base of the robot in the laboratory frame. Once these three transformations have been established, the ground truth pose of the mockup relative to the sensor can be calculated at any time from the robot’s telemetry, with an order-of-magnitude greater accuracy than the output of the pose estimation system Pose tracking is performed using our hybrid implementation of the Bounded Hough Transform [5] and Iterative Closest Point [1], running on flight-equivalent avionics.

3.5 Simulation Setup

Figure 7 – Laboratory setup Top: The Hubble Space Telescope mockup mounted on the “target” manipulator. Bottom: The terrestrial lidar mounted on the “chaser” manipulator, imaging the Radarsat-2 mockup.

3.4 Testbed Setup The testbed setup is shown in Figure 7. Scale mockups of satellites can be mounted on the end of an industrial “target” manipulator, so that their motion can be controlled. Sensor packages may be fixed (e.g., on a tripod) or mounted on a second “chaser” manipulator if real-time control is to be tested. In the experiments described below, the lidar is mounted on the chaser, but the target manipulator was able to provide all of the relative motions required. The mockup currently used is a 1:10 scale model of the HST. The mockup was designed to be a reduced-order representation of the key geometric features of the HST. Complex objects (such as external mechanical interfaces) are modelled with primitive shapes. This not only

Figure 8 – Actual scan data vs. simulated scan data The point cloud on the left was generated in the laboratory; dropped points are due to high intensity specular reflections. The data on the right was generated synthetically, using the nominal error values for the Ilris-3D lidar.

Despite having a well-characterized testbed, it is still desirable to have a means of testing the pose estimation algorithm under flight-like circumstances. To that end, we have created a VRML-based simulator which generates synthetic lidar returns (see Figure 8). Having created a digital model of the mockup and having characterized the EIFOV and range noise in the sensor, it is possible to

13

IEEE IROS 2005 Workshop on Robot Vision for Space Applications

• • • •

simulate the laboratory conditions, including relative motions, and compare the results. By using empirical data to validate and tune the simulator, simulated experiments can be performed using the actual flight parameters in order to determine the expected performance of the RLS system. Currently our simulator includes the following parameters: • Target model • Target initial position

• • •

Simulation vs. As-Tested Pose - Roll

600

3Sigma Error (deg)

3Sigma Error (mm)

Simulation vs. As-Tested Pose - X

500 400 300 200 100 0 25

30

35

40

Target relative velocity Field of view Scan density Range and bearing resolution and EIFOV (see Figure 8) Range measurement noise Angular measurement noise Pulse repetition frequency

10 8 6 4 2 0 25

45

30

As-Tested

Simulation

As-Tested

Target

3Sigma Error (deg)

3Sigma Error (mm)

600 500 400 300 200 100 0 30

35

40

4 2 0 30

As-Tested

Target

3Sigma Error (deg)

3Sigma Error (mm)

300 200 100 0

45

40

Simulation

Target

10 8 6 4 2 0 25

45

30

35

40

45

Range (scale m )

Range (scale m )

Simulation

40

Simulation vs. As-Tested Pose - Pitch

500 400

As-Tested

35 Range (scale m )

Simulation

35

Target

6

25

600

30

Simulation

8

45

Simulation vs. As-Tested Pose - Z

25

45

10

Range (scale m ) As-Tested

40

Simulation vs. As-Tested Pose - Yaw

Simulation vs. As-Tested Pose - Y

25

35 Range (scale m )

Range (scale m )

As-Tested

Target

Simulation

Figure 9 (a-f) – Comparison of pose performance using both real and simulated data

14

Target

IEEE IROS 2005 Workshop on Robot Vision for Space Applications

4. Results and Discussion

4.2 Flight Simulation Results

4.1 Laboratory Results and Validation of Simulation

Figure 10 shows the expected flight performance based on simulated data. Results are stated in the sensor frame; the HST roll axis was aligned with the yaw axis of the lidar. The advantage of the simulation is clearly demonstrated by the extent of ranges which can be tested. These results imply that the RLS system will be capable of producing pose estimates within the required accuracy; this information would not be attainable from ground tests without the scaling approach outlined in Section 3.

Figure 9 compares the pose estimation accuracy as tested in the lab with that derived for the same parameters in simulation. Experiments were performed with the target simultaneously translating and rotating. The scaled range (28 to 41 metres) is limited by the range of motion of the manipulator. Results are stated in the model frame of reference. Dotted lines show the required performance from Table 1. The results are quite similar between the lab test and the simulation, with the possible exception of X and roll. The most difficult degree of freedom to localize is roll, which corresponds to rotation about the long axis of the satellite (see Figure 4). Based on these results, the simulator has been validated for predicting the performance of the flight system under similar conditions.

5. Conclusions The detailed scaling approach taken herein allows us to perform scaled lidar-based pose estimation experiments using representative targets and relative motions. These results have been used to validate a simulation of the flight system and make predictions of its performance under circumstances which cannot be tested on the ground.

Three-Sigma Error vs Range - Position

References

3Sigma Error (mm)

700

[1]

Besl PJ, McKay ND, “A Method for Registration of 3-D Shapes”, IEEE PAMI, 14(2): 239-256 (1992) [2] Clohessy WH, Wiltshire RS, “Terminal Guidance System for Satellite Rendezvous”, J. Aerosp. Sci., 27(5): 653-658 (1960) [3] Fehse W, Automated Rendezvous And Docking Of Spacecraft, Cambridge: Cambridge UP (2003) [4] Granade SR, “Advanced Video Guidance Sensor and NextGeneration Autonomous Docking Sensors”, Proc. SPIE, 5418: 38-49 (2004) [5] Greenspan M, Shang L, Jasiobedzki P, “Efficient Tracking with the Bounded Hough Transform”, Proc. IEEE CVPR, 1: I520-I527 (2004) [6] Hill GW, “Researches in Lunar Theory”, Amer. J. Math., 1:5-26 (1878) [7] Howard RT, Bryan TC, Book ML, “Video Guidance Sensor Flight Experiment Results”, Proc. SPIE, 3380: 315326 (1998) [8] Jasiobedzki P, Greenspan M, Roth G, Ng HK, Witcomb N, “Video-based System for Satellite Proximity Operations”, Proc. ASTRA (2002) [9] Lichti DD, “New Angular Resolution Measure for 3-D Laser Scanners”, SPIE Optical Engineering, 43(10): 22182219 (2004) [10] MacLean SG, Pinkney HFL, “Machine Vision in Space”, Canadian Aeronaut. Space J., 39(2): 63-77 (1993) [11] Martin E, Maharaj D, Richards R, Tripp JW, Bolger J, King D, “RELAVIS: The Development of a 4D Laser Vision System for Spacecraft Rendezvous and Docking Operations”, Proc. SPIE, 5418: 69-80 (2004) [12] Polites ME, An Assessment of the Technology of Automated Rendezvous and Capture in Space, Marshall Space Flight Center: NASA/TP-1998-208528 (1998)

600 500

X

400

Y Z

300

Target

200 100 0 0

50

100

150

Range (m )

3Sigma Error (deg)

Three-Sigma Error vs Range - Orientation 10 9 8 7 6

Roll Pitch

5 4 3 2 1 0

Yaw Target

0

50

100

150

Range (m )

Figure 10 – Expected flight performance

15

IEEE IROS 2005 Workshop on Robot Vision for Space Applications

Scalable real-time vision-based SLAM for planetary rovers Robert Sim, Matt Griffin, Alex Shyr, and James J. Little Laboratory for Computational Intelligence University of British Columbia Vancouver, BC, V6T 1Z4, Canada {simra,mgriffin,shyr,little}@cs.ubc.ca Abstract Simultaneous localization and mapping is an important problem for autonomous planetary rovers and other space vehicles. While many authors have addressed the SLAM problem, few have done so in the context of producing largescale maps in real time using vision. This paper is concerned primarily with the issues presented by the large numbers of candidate features obtained from vision sensors, and the implications for data association. We present a RaoBlackwellised particle filter (RBPF) SLAM implementation that employs a stereo camera and the SIFT feature detector, and demonstrate that we can build maps of unknown environments with an extensive number of visual landmarks.

1 Introduction Autonomous planetary rovers that rely on vision sensing require the ability to construct dense visual representations of their environment for the purposes of both navigation and data collection. A central problem to constructing these representations is that as a mobile rover explores it accumulates error in its pose estimate, and subsequently its acquired map becomes inaccurate. This problem, generally referred to as simultaneous localization and mapping, or SLAM, has been widely studied and a variety of solutions have been proposed (for example [3,5,8,9]). However, there are a limited number of vision-based solutions that address real-time mapping, and representations that can scale up to thousands of mapped features. This paper presents an approach and experimental results for achieving SLAM solutions in realtime over long trajectories (73m or more), resulting in maps consisting of many thousands of landmarks. Our approach to solving the SLAM problem with a vision sensor is to combine a Rao-Blackwellised particle filter (RBPF)-based approach to mapping [8], coupled with efficient data structures developed by Montemerlo, et. al. for

Figure 1. Left: A rendering of the map from a sensor’s-eye view. Right: an image of the lab from a nearby position. representing a distribution over maps (referred to as FastSLAM [7]), and fast data association techniques for matching the relatively unambiguous feature descriptors obtained using the SIFT feature detector [6]. RBPF-based SLAM solutions operate by maintaining multiple map hypotheses, each associated with a stochastically sampled trajectory through the environment. The complete set of sampled trajectories and inferred maps approximates the probability distribution of maps conditioned on the vehicle’s actions and observations, p(M |A, Z), where M = {m1 , m2 , . . . , mn } is the set of maps, each consisting of a set of probability distributions describing landmark positions, A = {u1 , u2 , . . . , um } are the control inputs to the vehicle (that is, the vehicle’s actions), and Z = {z1 , z2 , . . . , zm } are the vehicle’s observations of the world (for brevity, we assume actions and observations are interleaved). One of the important contributions of the FastSLAM algorithm is the data structure it employs to share information between trajectory samples with common history. This facilitates real-time performance of the algorithm as the trajectory length grows. As an exploratory vehicle moves through the environment, the number of landmarks in its map can grow to number in the hundreds of thousands. This is especially true for vision-based mapping, where feature detectors might typi-

16

IEEE IROS 2005 Workshop on Robot Vision for Space Applications

cally return 500 feature observations in a single image. This poses a difficult problem for solving the data association problem, where a single feature observation might require comparison with all of the landmarks in the map. Such an extensive comparison might be required when extracted features are generic, without any uniquely defining characteristics (such as those typically employed in mapping algorithms employing laser range sensors). Furthermore, the computed data association is rarely unique, and often highly ambiguous. While FastSLAM allows for multiple data association hypotheses, these can reduce the robustness of the particle filter and potentially lead to sample starvation. In vision, however, there is usually a great deal of contextual information associated with a feature that can constrain data association, and reduce the cost of matching. In our work, we employ the SIFT feature descriptor, which provides descriptions of feature observations that have been shown to be very robust for feature correspondence. In addition, we apply a kd-tree over the space of SIFT features to facilitate approximate nearest-neighbor lookups in time logarithmic in the number of visually distinct landmarks. The main contributions of this paper are two-fold. First, we present an implementation of FastSLAM which is based on vision-based sensing, rather than traditional range sensing with a laser. Second, we present methods for performing rapid data association of hundreds of landmark observations in a single image against a database of tens of thousands of mapped landmarks. These results leverage the strengths of particle filter-based approaches for uncertainty estimation (such as the possibility of multi-modal and nonGaussian estimates), with data association techniques that were previously only applied to Kalman-filter based estimators (for example, [9]). Furthermore, where previous implementations of the FastSLAM algorithm have generally employed sensors with a wide field of view, our experimentation demonstrates the performance of the algorithm using sensors with a comparatively narrow field of view. Finally, we demonstrate experimentally that robust SLAM solutions can be achieved in real-time over long trajectories (more than 70m). The remainder of this paper is structured to provide a coverage of the strengths and weaknesses of current methods, elaborate on the details of our implementation, present and discuss experimental results, and finally discuss planned improvements.

2 Related Work There is a significant body of literature on SLAM using the Extended Kalman Filter and its inverse, the Extended Information Filter [3, 5, 12]. These approaches model the posterior distribution over maps as a unimodal Gaussian distribution. Of particular interest is the view based ap-

proach of Eustice, et. al. [3], which enables constant-time filter updating without significant sparsification approximations. However, a significant difficulty with a view-based approach is that the resulting map does not lend itself well to evaluation or human inspection, a strong prerequisite for an exploratory vehicle. Two map representations are popular in the literature, landmark based [7, 9, 11] and occupancy grid based [2, 4]. Occupancy grids are effective for dense but ambiguous information while landmarks are more suited to sparse but distinguishable features. Very impressive occupancy grids have been produced online by recent scan matching techniques which also use particle filters for pose estimation [4] [2]. Landmark and vision-based approaches have also performed well in the past, as in [9]. In the latter case, a reasonably small environment was successfully mapped by using a Kalman Filter and assuming independence between landmark and pose estimates. For large environments, this approach is likely to be overconfident and lead to filter divergence. In related work, we have also applied our approach to mapping where control and odometry information is unknown. In Sim, et. al., we demonstrated an approach to solving the SLAM problem using the approach outlined here, coupled with a visual odometry estimate for motion estimation [10].

3 Simultaneous Localization and Mapping This paper represents map estimation as the evolution of a Rao-Blackwellised particle filter [8]. In this context, the trajectory and landmark distribution is modeled as a dynamic Bayes network, where trajectories are instantiated as samples, and the landmark distribution can be expressed analytically for each trajectory. At time t, let st denote the vehicle pose, mt the map learned thus far and xt = {st , mt } be the complete state. Also, let ut denote a control signal or a measurement of the vehicle’s motion from time t − 1 to time t and zt be the current observation. The set of observations and controls from time 0 to t are denoted as z t and ut respectively. Our goal is to estimate the density p(st , mt |z t , ut ) = p(xt |z t , ut )

(1)

It has been demonstrated elsewhere that p(st , mt |z t , ut ) can be approximated by factoring the distribution in terms of sampled trajectories st , and independent landmark distributions conditioned on the sampled trajectories [8]: Y p(st , mt |z t , ut ) ≈ p(st |z t , ut ) p(m(k)|st , z t , ut ) (2) k

where m(k) denotes the k−th landmark in the map. That is, we instantiate a set of samples st , propagate them according to ut , and construct maps for each according to z t .

17

IEEE IROS 2005 Workshop on Robot Vision for Space Applications

A simplistic approach to running an RBPF for SLAM would yield an update complexity of O(M N ), where M is the number of particles at each step and N is the number of landmarks. However, Montemerlo et al. introduced in their FastSLAM work a tree-based structure which refines this complexity to O(M log N ) by sharing landmark estimates between samples [7]. Each sample in the filter will share unaltered landmark estimates with its ancestor particles (those landmarks that have not been observed from the time of the ancestor to the present). Each landmark observation results in a landmark being copied from its parent and updated but the rest of the tree remains the same.

3.1 Data Association In an unmodified natural environment, landmarks are difficult to uniquely identify. This problem is known as data association or correspondence and incorrectly matching observations to landmarks can lead to inconsistencies in a map. Stereo vision can quickly provide 3D information and when coupled with a scale-invariant feature transform (SIFT) detector [6] it can provide distinct landmarks. SIFT features are desirable as landmarks because they are somewhat invariant to image scale, rotation and translation as well as to illumination changes and affine or 3D projection. This combination can result in many viable landmarks from an unaltered environment.

of particles. After taking an observation (described in the next section), each particle in the current generation of particles is weighted according to the probability of the current observation zt , conditioned on that particle’s trajectory: wi

= p(zt |si,t , mi,t )

= k exp(−0.5∆z T Σ−1 ∆z)

(3) (4)

where ∆z = h(si,t ) − zt , h(·) is a generative model of the observations as a function of pose, Σ is the sum of the observation covariance and observed landmark covariance. The particle is weighted by how well the current observation is consistent with the map constructed from that particle’s trajectory. The weights are subsequently normalized across the population of samples, and then sampled probabilistically with replacement to produce the next generation of particles. Should any particle not be chosen for advancement it is pruned and all particles with no children are then recursively removed from the tree. If an insufficient number of particles are used, or resampling takes place too frequently, this can lead to starvation as hypotheses are pruned.

4.2 Sensing and Data Association Particle

SIFT ID’s

Landmark Estimates

4 Implementation 4.1 State Representation We describe samples of the vehicle’s pose with the vector st = [x, y, θ], situated in a plane 1 . At each time step, the N pose samples are propagated according to the motion model p(st |st−1 , ut ), which is userdefined. Over time the distribution of samples can become non-Gaussian, and even multi-modal. The noise model we apply is designed to take a conservative approach to estimating the possible drift in the robot’s pose over time, while keeping in mind that noisier models require more particles to prevent starvation as the underlying pose distribution disperses. The specific action sequence is dependent on the robot’s exploration policy. For this paper, we drive the robot by hand, and infer the robot’s actions from odometry measurements between observations. For each action the filter produces a new generation of particles that inherit properties from the previous generation. The set of hypothetical trajectories executed by the robot is represented by this tree 1 Our current work is aimed at extending our results to a full 6-DOF motion model.

Figure 2. Each particle has an associated map, organized by SIFT descriptor. Similarly, each SIFT descriptor might have multiple landmark estimates, each spatially distinct. We employ a data structure similar to that described in [7] as a map representation. Each particle has associated with it a set of landmark estimates, described by Gaussian distributions. However, in the vision-based case, we take advantage of the descriptive power of the SIFT transform (described below), enabling us improve the quality of data association. In this formulation, each particle maintains a list of SIFT IDs, and these IDs in turn point to a linked list of one or more 3D landmark estimates (Figure 2). Note that one SIFT ID can point to multiple landmarks– landmarks that have similar appearance but are spatially distinct.

18

IEEE IROS 2005 Workshop on Robot Vision for Space Applications

We are using a Point Grey Research BumbleBee stereo camera for our primary sensor, and extract SIFT features using a difference of Gaussian detector [6]. The features’ partial invariance to image scale, rotation, translation and 3D or affine projection are what make them desirable landmarks. Each SIFT feature has a 36 dimension identifier, or key, associated with it and this matching is based on finding a suitably distinct match. We perform a linear search of the keys in the left image for each key in the right. The two keys with the smallest Euclidean distances from our target key are found and if the ratio of best and second best distances is below a set threshold (currently 0.6) it is considered a good match. That is, for keys kl1 ,, kl2 and kr , in the left and right images, according to subscripts, a successful match of kl1 to kr satisfies the property kl1 − kr < 0.6 kl2 − kr

(5)

Once a 3D feature is extracted from the stereo pair, we determine if this feature corresponds to one we have seen before. Our approach to data association is depicted in Figure 3. To efficiently store and access what can quickly become a large number of keys we use a kd-tree. The kd-tree facilitates nearest-neighbor matching in time logarithmic in the size of the tree, and has been demonstrated to be reliable for object recognition tasks [1]. The disadvantage of using a kd-tree is that it can sometimes produce not the nearest match but a close match. We maintain a single tree for the sensor and associate an arbitrary integer ID with each SIFT identifier we add. New keys are considered to be candidate keys and are not passed as an observation to the particle filter until they have been observed for a sufficient number of frames. Since we do not currently use negative information to remove erroneous landmarks from the maps this is an effort to limit their number. Each particle’s map is indexed by a set of IDs associated with SIFT descriptors and each node contains a linked list of landmarks sharing that descriptor. Multiple data associations can be entertained by the filter because each particle determines the specific landmark to which an observation corresponds. The number of landmarks associated with an ID is typically quite small as shown by Table 1. A particle’s weight is updated for a given landmark observation according to Equation 4 by first selecting from the linked list for the matched landmark ID the landmark estimate that is closest to the observed point in the global frame of reference. The maximum distance threshold for this comparison is based on an approximation of the camera’s error and if multiple landmarks fall within this range the closest is chosen. Clearly since the filter is initiated without a map any observation with an unknown ID or a 3D position which does not match is treated as a new landmark.

In the following section we describe our experimental results.

5 Experimental Results For the purposes of our experiments, we used an RWI B14 robot with a BumbleBee stereo head from Point Grey Research. The robot was driven through a laboratory environment, and the robot collected 5000 images along a trajectory of approximately 74m. We ran the system using 100 particles, which enabled a frame rate of approximately 2.1Hz (Figure 5). Table 1 describes at approximately 1000 time-step intervals the average number of landmarks associated with each map, the total distance traveled, the total number of SIFT id’s, the time step in history at which the filter converges to a single map, and the total number of landmark instances in the system (these can outnumber the product of samples and mean landmarks as many instances are not promoted to full landmarks until they have been observed at least 3 times). Table 1. Map summary (see text for details). Time

Avg. LMs per Samp.

Dist. traveled (m)

5043

31577

72.64

29462

4847

108636

4009

25300

59.66

23766

4000

84146

2987

18830

42.02

17826

2889

70369

2043

12840

27.05

12338

1989

60871

1021

6083

14.04

5922

919

23688

Tracked Sift Features

Filter convergence

Total LMs

Figure 4 depicts the map constructed for the maximumlikelihood particle at the end of exploration, beside a map computed using dead reckoning alone. The filter-based map is clearly more accurate in that it correctly captures the rectilinear alignment of the three rooms traversed by the robot.

6 Discussion Among the key observations from our experiments is that we are able to successfully map a large environment in real-time. At the end of map construction, we are matching 29,462 SIFT features, and each map consists of more than 31,000 landmarks, with a total of only 109,000 landmarks shared across all the maps. As the maps grow in

19

IEEE IROS 2005 Workshop on Robot Vision for Space Applications

Match

Observe

Weight

The conditioned observations are then compared against the existing entries in the particle’s map with the same ID (light− ly shaded circles). The landmark entrys associated uncertainty (illustrated as an el− lipse) is used to determine the likelihood of a match.

The sensor measurements are passed to the individual particles

Particle 1

44 4

44

8

8

The quality of the data association is assessed by the weighting step of the particle filter. Particles that have low weights will tend to be pruned. In this example, ID 8 is a new ID and so unknown by all particles. It will automatically be initialized as a new entry. However, ID 4 has three prev− ious entries associated with it in both maps (other particles might have more or fewer than three entries).

8

4

44

44

44

44

Particle 1 will match three of the four observations to existing entries in its map. The matches will increase the particle’s weight by an amount deter− mined by the quality of each match.

4 Particle 2 The robot takes a series of relative measurements and finds a matching I.D. for each in the KD−tree

4

Particle 2 will match only one of the observations. By creating three new entries this particle will likely be weighted much lower than the one above.

4 4

4

8

8

4

4

4 4

4

4

4 4

Figure 3. SIFT-based data association.

Figure 4. Left: The constructed map for the best sample at the end of exploration. Yellow: maximum weight trajectory. Pink: Dead reckoning trajectory. Grid lines indicate 25cm intervals. Right: the constructed map using dead reckoning.

20

IEEE IROS 2005 Workshop on Robot Vision for Space Applications

environment.

Processing time per frame

4000

3500

Acknowledgements

3000

Time (ms)

2500

2000

1500

1000

500

0

0

500

1000

1500

2000

2500 Frame

3000

3500

4000

4500

5000

Figure 5. Per-frame computation time in milliseconds. The mean frame rate is 2.1Hz on a 3.2GHz Intel Xeon Processor

size, only a small slow-down in computation time is exhibited. More importantly, relative to dead-reckoning, we have demonstrated that our resulting map is more accurate. This can be observed in the fact that the three rooms observed by the robot are correctly aligned, as well as the fact that when the robot returns to the first room it correctly locates the position of the door into the third room (near the center of the constructed map).

7 Conclusion We have presented an implementation of an RBPF-based SLAM algorithm using a vision-based sensor. The key goal of our work is to facilitate scalable maps that incorporate large numbers of visual landmarks (on the order of hundreds of thousands). The primary contributions of this work are the facilitation of vision-based sensing in association with a particle filter that supports multiple data associations. We have also experimentally demonstrated the success of the system on a real robot. Among the goals of our work are the problem of constructing visual representations of very large environments (on the order of 100m in diameter), and in particular in outdoor environments. We also hope to extend this work to full 6-DOF representations, which will be better suited to space vehicles and exploration over rough terrain. We believe that the data structures developed in this work can facilitate the construction of maps of this size, and our ongoing goal is to accomplish this task experimentally. One outstanding question for our work is how to reliably cull SIFT features that are rarely observed. This features tend to clutter the kd-tree and deletion requires costly re-balancing in the tree. An outstanding question is whether loop closure can be successful in very large environments, and how many particles might be required to successfully close the loop. Finally, we are interested in developing approaches to autonomous exploration based on particle-based representations of the

The authors wish to thank Pantelis Elinas for providing assistance with key matching and stereo correspondence. This work was supported in part by the Canadian Space Agency/NSERC Postdoctoral Fellowship, and NSERC undergraduate summer research assistantships.

References [1] J. S. Beis and D. G. Lowe. Shape indexing using approximate nearest-neighbour search in high-dimensional spaces. In Proc. IEEE Conf. on Computer Vision and Pattern Recognition, pages 1000–1006, Peurto Rico, June 1997. IEEE, IEEE Press. [2] A. I. Eliazar and R. Parr. DP-slam 2.0. In Proc. ICRA 2004, New Orleans, LA, 2004. IEEE Press. [3] R. Eustice, H. Singh, and J. Leonard. Exactly sparse delayedstate filters. In Proc. of the 2005 IEEE Int. Conf. on Robotics and Automation, pages 2428–2435, Barcelona, Spain, April 2005. [4] D. H¨ahnel, D. Fox, W. Burgard, and S. Thrun. A highly efficient FastSLAM algorithm for generating cyclic maps of large-scale environments from raw laser range measurements. In Proc. Conf. Intelligent Robots and Systems (IROS), 2003. [5] J. J. Leonard and H. F. Durrant-Whyte. Simultaneous map building and localization for an autonomous mobile robot. In Proc. of the IEEE Int. Workshop on Intelligent Robots and Systems, pages 1442–1447, Osaka, Japan, November 1991. [6] D. G. Lowe. Object recognition from local scale-invariant features. In Proceedings of the Int. Conf. on Computer Vision, pages 1150–1157, Corfu, Greece, September 1999. IEEE Press. [7] M. Montemerlo, S. Thrun, D. Koller, and B. Wegbreit. FastSLAM: A factored solution to the simultaneous localization and mapping problem. In Proceedings of the AAAI National Conf. on Artificial Intelligence, Edmonton, Canada, 2002. AAAI. [8] K. Murphy. Bayesian map learning in dynamic environments. In 1999 Neural Information Processing Systems, 1999. [9] S. Se, D. G. Lowe, and J. Little. Mobile robot localization and mapping with uncertainty using scale-invariant visual landmarks. Int. J. Robotics Research, 21(8):735–758, 2002. [10] R. Sim, P. Elinas, M. Griffin, and J. J. Little. Vision-based SLAM using the Rao-Blackwellised particle filter. In Proceedings of IJCAI Workshop on Reasoning with Uncertainty in Robotics, Edinburgh, Scotland, 2005. [11] R. Smith, M. Self, and P. Cheeseman. Estimating Uncertain Spatial Relationships in Robotics. In Autonomous Robot Vehicles, I.J. Cox and G.T. Wilfong, eds. 167–193, SpringerVerlag. 1990. [12] S. Thrun, Y. Liu, D. Koller, A. Ng, Z. Ghahramani, and H. Durrant-Whyte. Simultaneous localization and mapping with sparse extended information filters. Int. J. Robotics Research, 23(7-8):693–716, 2004.

21

IEEE IROS 2005 Workshop on Robot Vision for Space Applications

Estimation of the aerodynamical parameters of an experimental airship Diego Patino1,2 , Leonardo Solaque1,2 1 LAAS - CNRS 7 Avenue du Colonel Roche 31077, Toulouse Cedex 4, France Email: [email protected], [email protected]

Simon Lacroix1 , Alain Gauthier2 2

Andes University. Mathematics and Electronic Department Carrera 1a No. 18A-10 Bogot´a, Colombia Email: [email protected], [email protected]

By doing a good computational implementation of the parameter estimation, we can simulate several flight conditions that can be used as a substitute for the standard wind tunnel measurements. The most important aim of this work is to determine the dynamical parameters of the robot. To describe the dynamics of an airship as KARMA, we must measure the following variables 1 : • The Roll (φ), angle of the rotation around X axis, • The Pitch (θ), angle of the rotation around Y axis, • The Yaw(ψ), angle of the rotation around Z axis, → − T • The position in the 3D space ( η1 = [x y z] ), → − T • The velocity in the space 3D ( V = [vx vy vz ] ). All this data is registered with respect to the reference frame ENU (East, North, Up). After the signals have been registered, we focus on the data quality problem because all real sensors have a limited accuracy which makes our task harder. Many algorithms for parameter estimation of dynamical systems have been proposed recently [2], [3], [4]. However, most of them are applicable only to limited cases such as models with weak non linearities. On the other hand, previous proposals cannot deal with the problem of noise. Only a few of them work well under reduced amounts of noise. Because we must pay special attention to the suppression of noise and the non linear nature of the system, we propose to use the Kalman filter realizations. This filtering procedure is based on the minimization of the quadratic error between the model and the observed data. Cf. [7]. Among the different possibilities offered by Kalman filtering procedures, we have selected two: the Classical Extended Kalman filter (EKF) and the Unscented Kalman Filter (UKF). The first one was proposed in [8], [12]. It is based on the linearization of one particular trajectory. The second one has a better performance even under chaotic regimes [1], [11]. As we explained before, the estimation of aerodynamic parameters of meteorological dirigibles is an interesting source of applications for the methods and procedures proposed in this work. Moreover, the testing on scale models does not provide us with satisfactory answers about the dynamic behavior of the real system. This is owed mostly to the huge amounts of data that practitioners must store and analyze. In this work we will show how to

Abstract— In this article we show the results of a method to identify the aerodynamic parameters of the model of a blimp in cruise flight conditions. The proposed model can be used later on for the design of the control laws. The approach consists in analysing various flights data: it relies on the use of Kalman filtering methods to minimize a cost function in a computationally efficient way. The performance of the approach is remarkable, as shown by a comparison with results obtained in wind tunnel experiments. It can therefore provide a quite precise estimation of the dynamics of a blimp, including its aerodynamic parameters, avoiding costs since we do not use wind tunnel measurements. Index terms—airship, identification, Kalman filter, space state model.

I. I NTRODUCTION In this work we will present the parameters identification of a dynamic model for the robot KARMA [6], [9]. Its flight is strongly affected by nonlinear and meteorological phenomena like wind currents and sudden pressure changes. This complex situation is a good test to our method and it is also a good source of interesting applications. Usually, scientists describe the behavior of this kind of aircrafts by doing thorough measurements of the dynamics of a scale model in the wind tunnel [10]. The methods and techniques proposed in this work will provide us with a better description of the real system. Indeed, we present a general computational scheme to simulate the flight of the dirigible under different atmospheric conditions. When we face the problem of describing non linear dynamical systems, we can use a technique based on parameter estimation. It consists in determining the coefficients of the system by fitting time series data obtained from measurements of the dynamical parameters of the body. Usually, this is done at many time periods all along several real performances. This methodology is particularly useful for the analysis of the dynamics of non linear systems. Furthermore, it also allows us to design proper control laws according to the operational aims of the robot [4]. On the other hand, the performance and reliability of the model are strongly based on the quality of the available observations from the real robot and its scale models. We must face here the particular problem of data quality and the presence of noise. Therefore, the used method is well adapted to accomplish a particular hard task: parameter estimation from noisy and corrupted signals.

1 These quantities are measured by using an electronic compass and Diferencial Global Position System (DGPS)

22

IEEE IROS 2005 Workshop on Robot Vision for Space Applications

Md is the matrix composed of masses and momentums. → vector Td (− ν ) is composed by the Coriollis force and centrifugal force. → − • g( η ) contains the Archimedes force and the body weight. • Tp (Fm , µ) is the propulsion force vector → − • Ta ( ν ) represents the aerodynamical vector that contains the aerodynamical forces. → The vector Ta (− ν ) is formulated as: → − → → Ta (− ν ) = A ν˙ + D1 (− ν2 )ν + Tsta (2)

apply Kalman filtering to do realistic and cheap computational simulations of the dynamical behavior of blimps used in surveying. The present paper is organized as follows. In Section 2, we show a short introduction to the mathematical modelling of the dynamics of the dirigible by using the physical laws of dynamical systems in a general way. In Section 3 we introduce the essentials of Kalman filtering in signal processing. Section 4 is devoted to the applications of the methods that we have devised. We also describe the dynamics of particular airship: KARMA. In Section 5 we give some concluding remarks and present our future research work.

• •

A is the matrix of jointed masses, D1 (ν2 ) represents the phenomena of “translation-rotation” and “rotation-rotation” by the Bryson’s theory [14] and Tsta is the stationary effort and moments in C of the dirigible. The control variables of the airship are: the vectorization angle of the motors µ, the elevation angle δe , the angle for rotations δg and the torque generated by the motors Fm . We clarify that the measures of the sensors are in the frame R0 . Thus we have to consider the transformation matrix → J1 (− η2 ):

II. DYNAMICAL M ODEL In this section, we present the dynamical model of the airship in a brief way. The equations are quite complex and since it is not the main topic of this paper, we will not stress here. For detailed information Cf. [10], [9],[6]. To describe the dynamic of the blimp, we have to consider the following assumptions: • The dirigible is a solid body, • The total mass of the dirigible is constant. • The center of the Archimedes force is exactly the same as the volumetric center (C) of the dirigible envelope. along with the Newton’s classical laws of mechanics and the aerodynamic. Next, we define three reference frames: R0 as the global frame fixed to the earth, Rd as the local frame fixed to C and Ra as the aeronautic frame fixed to C. This will make easier the description of the navigation of the blimp. The Figure 2 shows the chosen reference frames.



cos ψ cos θ

 cos θ sin ψ

− sin ψ cos φ+sin φ cos ψ sin θ cos ψ cos φ+sin θ sin ψ sin φ

− sin θ

→ and J2 (− η2 ):

sin φ sin ψ+sin θ cos ψ cos φ

− cos ψ cos φ+cos φ sin θ sin ψ 

cos θ sin φ

cos θ cos φ

(3) 

sin φ tan θ

cos φ tan θ



0

cos φ

− sin φ



0

sin φ cos θ

cos φ cos θ

1

between the local frame R and the global frame R0 :  −  → → − ˙η = J1 (η2 ) 03×3 ν → 03×3 J2 (− η2 )

Fig. 1 ROBOT KARMA

(5)

Where K1 and K2 are constants depending on the geometry of the dirigible, CT1 . . . CT3 are the coefficients of the tangential effort, CL1 . . . CL3 are the coefficients of the lateral effort, CN1 . . . CN3 are the coefficients of the normal effort, ClC1 . . . ClC3 are the roll coefficients respect to C, CmC1 . . . CmC3 are the pitch coefficient respect to C, ClC1 . . . ClC3 are the yaw coefficient respect to C, α = f1 (u, v, w) is the angle of incidence and β = f2 (u, v, w) is the angle of slip.

Now, the dynamical model of our blimp is given by: → → → → Md − ν˙ = −Td (− ν ) − Ta (− ν ) − g(− η ) + Tp (Fm , µ)

(4)

Usually, the expression of Tsta is cumbersome [10]. After several assumptions about the system, we can simplify Tsta and still obtain a general representation2 which includes the aerodynamic phenomena. This is:   K1 (cT 1 + cT 2 α + cT 3 δe )  K1 (cL1 + cL2 β + cL3 δg )     K1 (cN 1 + cN 2 α + cN 3 β)    Tsta =  (6)   K2 (clC1 + clC2 α + clC3 δe )  K2 (cmC1 + cmC2 β + cmC3 δg ) K2 (cnC1 + cnC2 α + cnC3 β)

Fig. 2 T HE DIFFERENT FRAMES

(1)

where: → − → − − → T = [u, v, w, p, q, r]T is the state vector • ν = [ ν1 , ν2 ] which is composed of the translation and rotation velocities in the local frame Rd . → − → − − →T • η = [ η1 , η2 ] = [x, y, z, φ, θ, ψ] is the state vector of the position and angles in the global frame R0 .

2 This

23

approach is used in the airships control



IEEE IROS 2005 Workshop on Robot Vision for Space Applications TABLE I N UMBER Matrix Parameters

A 11

D1 7

OF PARAMETERS TO ESTIMATE

CT 3

CL 3

CN 3

ClN 3

P (k|k) = (I − Kk Ck )P (k|k − 1) CmN 3

CnN 3

Eq. (11) is called the prediction equation of the covariance. We must compute this equation at first. Eq. (12) is called the correction eq. of the covariance. We compute it after we have solved the prediction equation. Finally, we can write the Kalman gain as:

The representation (6) provides a simpler model of the dynamics of the airship in cruise flight. It means that the aerodynamical velocity and the height of the airship are constant. It also simplifies the formulation of the control laws. Our main aim is to estimate all the unknown parameters. Table I shows the 36 parameters required. We stress that all of them affect the aerodynamic behavior of the dirigible. To → − refer to the set of parameters we use the vectorial notation λ .

Kk = P (k|k − 1)CkT (Ck P (k|k − 1)CkT + σν )

The problem of filtering in statistics consists in determining the probability density. We analyze the non-linear discrete case with mutually uncorrelated Gaussian noises  ∼ Nk (0, σ ) and ν ∼ Nk (0, σν ) up to time tf .

B. Unscented Kalman Filter (UKF): There are many ways to deal with non linear state space models, proposed by other authors. C.f. [1], [5], [11]. This kind of procedures belongs to the class of statistical linealization schemes in which densities are truncated instead of the models f and h. They use a sample set χ2n 0 , called sigmA points, which maintains the same mean and covariance while it is propagated through the non linear state space model:

(7)

Where X is the state variable defined in Rn . All signals are sampled by using the same time interval k∆T . Cf. [12], [16],[17]. Kalman filtering consists in doing one prediction and one correction at every time step. The prediction estimates the mean of the filter density, and makes the state estimation with the formula x ˆ(k|k − 1). The correction uses a priori information as of the measure Y = {Y1 , Y2 , . . . , Yk−1 } for adjusting the prediction step. C.f. [7], [8]. The important task here is to find the Kalman gain matrix Kk of the model: x ˆk|k = x ˆk|k−1 + Kk (yk − yˆ(k|k − 1))

(13)

In this way, the algorithm that we propose implements the equations (11)-(13)-(12)-(8) in the listed order. For the non linear, case we approximate the nonlinear functions f and h (7) by their first order Taylor series expansions. This approximation implies that we have to compute the Jacobian of the non-linear functions and to evaluate them in the sampling procedure explained above. C.f. [13].

III. D ESCRIPTION OF THE K ALMAN FILTER

Xk+1 = f (Xk , uk ) + k Yk = h(Xk ) + νk

(12)

χ0 (k − 1|k − 1) = x ˆ(k − 1|k − 1) χi (k − 1|k − 1) = x ˆ(k − 1|k − 1) + β χi+n (k − 1|k − 1) = x ˆ(k − 1|k − 1) − β p β = (n + λ)P (k − 1|k − 1)

(14)

where i = 1, . . . , n. The value λ is a scaling parameter that determines the spread of the sigma points around √ the state and it is usually set to a small positive value and · is the square root of the matrix P that can be computed by Cholesky factorization [15]. The data set χ2n 0 is propagated through the full nonlinearities f and h:

(8)

This can be done with the Extended Kalman Filter (EKF) [4] or with the Unscented Kalman Filter (UKF). Although these techniques are based in the same philosophy, they differ in the way that they compute the Kalman gain matrix.

χi (k|k − 1) = f (χi (k − 1|k − 1))

(15)

ϕ(k|k − 1) = h(χi (k|k − 1))

(16)

A. Extended Kalman Filter (EKF): This method uses the stochastic lineal system: Xk+1 = Ak Xk + Bk uk + k Yk = Ck Xk + νk

These results represent a probability density characterized by the mean:

(9)

and it estimates the mean and the variance in the following way:

x(k|kˆ − 1) =

ˆ − 1)) + Bk uk x ˆ(k|k − 1) = Ak f (x(k − 1|k

y(k|kˆ − 1) =

(10)

P (k|k − 1) = Ak−1 P (k − 1|k −

+ σ

i=0 2n X

Wi χi (k|k − 1) (17) Wi ϕi (k|k − 1)

i=0

We highlight the fact that the control input (uk ) is deterministic. After solving one particular optimization problem, as it has been explained in [7], we obtain the covariance matrix equation for the associated Kalman filter: 1)ATk−1

2n X

and by the covariance P PY Y (k|k−1)= 2n y (k|k−1))(ϕ(k|k−1)−ˆ y (k|k−1))T i=0 Wi (ϕ(k|k−1)−ˆ

(18)

(11) 24

IEEE IROS 2005 Workshop on Robot Vision for Space Applications TABLE II PARAMETER ESTIMATION FROM P PXY (k|k−1)= 2n x(k|k−1))(ϕ(k|k−1)−ˆ y (k|k−1))T i=0 Wi (χ(k|k−1)−ˆ

Parameter a11 a22 a33 a44 a55 a66 a15 a24 a26 a35 a46 m13 m33 xm11 xm22 xm13 x2 m11 x2 m22 CT 1 CT 2 CT 3 CL1 CL2 CL3 CN 1 CN 2 CN 3 ClC1 ClC2 ClC3 CmC1 CmC2 CmC3 CnC1 CnC2 CnC3

(19) P P (k|k−1)= 2n x(k|k−1))(ϕ(k|k−1)−ˆ x(k|k−1))T i=0 Wi (χ(k|k−1)−ˆ

(20) where the values Wi are called ”weights” and can be computed in different ways. Cf. [1]. The Kalman gain can be obtained as: Kk = PXY PY−1 (21) Y The prediction step of the Kalman filtering is defined by the equations (18) to (20). The correction step is defined by the following formula: P (k|k) = P (k|k − 1) − Kk PY Y (k|k − 1)KkT

(22)

In short, the UKF method can be described as follows: (14)(15)-(16)-(17)-(18)-(19)-(20)-(21)-(8)-(22). C. Kalman filter extension for parameter estimation In this kind of method, we will use the parameters as if they were a dynamical systems in themselves. Thus, in the particular case of steady state regime, they must behave as constants: → − → − λ k = λ k−1 (23) Notice that the variable λk is not really a constant, because the equations of the Kalman filter will modify the parameter with every update of the measurement. Thus, the overall state space for our application, has increased: 12 variables corresponding to the original states and 36 variables corresponding to the aerodynamical parameters.

Real 1.5276 21.0933 20.4220 16.3905 382.1290 388.0972 0 0 −69.6939 67.7045 0 1.2801 −49.7019 25.6919 23.6878 −4.5582 −173.4906 −166.3538 −0.5878 0.4452 −0.8574 20.4587 −10.5897 −5.8537 −0.0553 0.1069 0.205 −0.0400 −0.1730 0.2000 0.0240 0.9370 0.0850 0.9370 −0.7412 0

SIMULATED DATA

EKF 1.2633 20.8660 20.2018 15.9151 380.0030 379.9988 0.1309 −0.0958 −59.9030 70.0684 0.0801 2.0621 −48.0249 23.0748 20.0129 −9.1165 −150.0044 −149.9994 −1.8974 0.5487 −2.8752 50.8756 −15.9874 −1.2234 −0.0417 0.1071 1.0258 −0.0405 −0.1919 0.2930 0.0205 0.7975 0.0170 1.0833 −0.8574 0.0450

UKF 1.9660 21.1896 21.4239 16.8548 380.1463 384.0155 0.0001 0.1628 −69.8978 70.3361 0.0101 1.5235 −48.5013 24.5183 21.0075 −5.4699 −170.8227 −158.8524 −0.6579 0.5789 −0.6877 15.4789 −11.5582 −7.2243 −0.0664 0.1069 0.389 −0.0415 −0.1271 0.2173 0.0304 0.9982 0.0266 0.6207 −0.7589 −0.0010

TABLE III TABLE OF ERROR

IV. R ESULTS State x y z φ θ ψ

To determine the performance and reliability of our method we have carried out two particular tests. These particular examples show the strength of the method when it is applied to specific situations. A. Test 1: Parameter estimation of AS500 This test is based on simulated signals, because the dynamical system has been determined from aerodynamical experiments [6], [10]. Here we make one approximation to the correct parameters of the system by using the Kalman filter based method proposed previously. We use N = 1500 samples and the Euler method for integrating ODE [15] with a step size ∆T = 0.1. Notice that the integration step size may differ → − → from the sampling time. The variables − η and V are corrupted with a small additive Gaussian noise. Initial conditions for the parameters are chosen in a random way, it affects the convergence of the overall method. After processing the signal, we estimate the parameters under the assumption that the dynamical model and the statistical features of the noise are known in advance. The results are shown in Table II. Fig. 3 shows the time evolution

Mean UKF 0.0439 0.0670 0.0217 0.0626 0.0179 0.0153

Variance UKF 0.0015 0.0025 0.0038 0.0009 0.0001 0.0171

Mean EKF 0.0639 0.0209 0.0086 0.0651 0.0209 0.0603

Variance EKF 0.0016 0.0027 0.0040 0.0009 0.0003 0.0018

of the parameter CT 2 simulated by the proposed methods. The rest of the parameters behave in a similar way. To simplify this presentation we only exhibit a few graphics of them. Figures 4 and 5 compare the dynamics of the real model against the results of the simulation. These results where estimated by using the parameters shown in Table II. In order to check the reliability of our results, we use the relative error: e=

|x − x ˆ| |x|

(24)

where x ˆ is the simulation that we obtain for the new parameters. Table III shows the mean and variance of the error in some of the states. 25

IEEE IROS 2005 Workshop on Robot Vision for Space Applications

Fig. 3 E VOLUTION OF THE PARAMETER CT 2

Fig. 5 C OMPARISON BETWEEN REAL MODEL AND UKF E STIMATION . [ M / S ] AND Z[ M ] AND θ[ RAD ]

T [ S ] VS

V

movement of this test in a 3D vision. 250 Trajectory of Test 2 Start point End point 200

150

North[m]

100

50

0

−50

−100

Fig. 4 C OMPARISON BETWEEN REAL MODEL AND EKF E STIMATION .

−150 −50

T [ S ] VS

0

50

100 East[m]

150

200

250

V

[ M / S ] AND Z[ M ] AND θ[ RAD ]

Fig. 6 E AST [ M ] VS N ORTH [ M ]

B. Test 2: Parameter estimation of KARMA

Summarizing, We use a real flight trajectory lasting 922.8 seconds, which gave us 917 values of position and speed. We also take 9235 values for the different angles involved in the dynamical system. Then, we synchronize the observations by taking into account the measure instrument having the least frequency among all the frequencies of the measurements: this is the electronic compass. After the signal has been preprocessed, it is ready to estimate the parameters. We use here the UKF based algorithm because it shows a good performance when was applied on simulated experiments. Figure 8 shows the results obtained from several parameters taken from real flight trajectories. We stress that the convergence towards a constant value proves the soundness of our method and it is also noticed that this good performance takes place even in the presence of noisy data. Table IV shows all the estimated values for every dynamical parameter of the experimental dirigible in which we are interested. It also shows the covariance of the signal obtained in every case. We can use the parameters as a new model for the real dynamical system, then we can simulate its behavior with arbitrary input

After we prove that the method works for simulated signals, we can apply it for real signals of the robot KARMA. In this test we use real measures of velocities and position reported by the GPS each second and measures of angles reported by the electronic compass each 0.8 seconds. These two instruments are placed in the nacelle of the dirigible and they are connected to a CPU that stores the data in a hard disk. In order to estimate the parameters we must suppress the problem of the non synchronized measure. We face this with special routines of data fusion. Cf. [4]. The flight for the identification was made in cruise conditions. It means that we keep a constant aerodynamical speed and a constant height. The maneuvers for this flight were basically gyres that were executed without slips and with a radius greater than 28.17 meters (3 times the length of the dirigible). For visualizing better the maneuvers we show a graphic East vs. North in the Figure 6 and in the figure 7, we can see some part of the 26

IEEE IROS 2005 Workshop on Robot Vision for Space Applications TABLE IV PARAMETER ESTIMATION FROM Parameter a11 a22 a33 a44 a55 a66 a15 a24 a26 a35 a46 m13 m33 xm11 xm22 xm13 x2 m11 x2 m22 CT 1 CT 2 CT 3 CL1 CL2 CL3 CN 1 CN 2 CN 3 ClC1 ClC2 ClC3 CmC1 CmC2 CmC3 CnC1 CnC2 CnC3

Fig. 7 3D

VIEW OF THE FLIGHT

Fig. 8 PARAMETER CONVERGENCE OF A REAL MODEL

UKF 11.2428 90.4922 70.5841 42.3387 383.7979 419.9314 6.9103 1.2382 195.3407 59.4323 −28.5030 −33.7772 93.7707 76.4093 54.7163 75.3240 201.9972 224.8353 −2.9074 −13.9818 −0.7970 15.0799 −7.6177 −3.2706 −2.1196 −0.2250 0.6837 −0.0725 2.9208 1.0168 5.1576 −1.8937 −1.1017 −0.1082 −0.5101 0.0115

REAL DATA

Covariance(Pii ) 0.1580 0.0925 0.0922 0.0901 0.0837 0.0872 0.1309 0.1240 0.1269 0.1053 0.1053 0.0621 0.0982 0.0905 0.0192 0.0962 0.0335 0.0896 0.1290 0.0949 0.0767 0.0744 0.0644 0.0249 0.0676 0.0446 0.0508 0.1442 0.1509 0.0582 0.0538 0.0814 0.0762 0.0942 0.0415 0.0227

data. In particular, we take the same input signals that we use in the real performance to compare the simulation against the real responses. Fig. 9 shows this comparison. Finally, we conclude that the UKF algorithm gives reliable parameter estimation from incomplete and noisy corrupted data taken from experimental data of flying robots. V. C ONCLUDING R EMARKS In this work we have shown two different techniques for parameter estimation based on Kalman filtering. These methods were applied on real data to estimate the aerodynamic parameters of experimental dirigibles under non linear flight conditions. The algorithms proposed here are EKF and UKF. Both of them have shown a good computational performance. The fact that the UKF algorithm presents a better performance because it can work under chaotic dynamical systems is remarkable. This algorithm involves a routine to approximate the solution of nonlinear differential equations. As it strongly depends on the numerical solution of the differential equations, we take Runge-Kutta of order four as the most proper routine. The EKF can be used with other numerical routines intended for

Fig. 9 C OMPARISON MODEL AND REAL DATA . T [ S ] VS Vx [ M / S ], Y [ M ], φ[ RAD ]

solving ODE, but there is no advantage in doing so as these routines usually require the expensive calculus of the Jacobian or the Hessian matrices. We stress that the choice of the initial approximation is 27

IEEE IROS 2005 Workshop on Robot Vision for Space Applications

crucial for the convergence of the EKF algorithm. However, this is not the case for the UKF algorithm as it performs well even under chaotic dynamical system. In spite of the fact that UKF algorithm performance does not depends on the initial approximations of the parameters, it strongly depends on the initial election of the covariance matrices. The parameter identification of robots used as dirigibles is a very hard task because it must be addressed by using experimental aerodynamic measurements in wind tunnels. The most remarkable achievement of this work is that we can estimate such kind of non linear dynamical parameters by doing numerical simulations based on Kalman filtering. One particular limitation of our approach is that we need to know the specific model of the dynamical system in advance. We will focus on this particular limitation in future works. ACKNOWLEDGMENTS Universidad de los Andes. L.S. thanks to COLCIENCIAS and its Programs: National Doctorates and Bomplan. D.P. thanks to Departamento de Matem´aticas, Dr. Rene Meziat for his help and support and the research Grant 1204-05-13627COLCIENCIAS-2004. R EFERENCES [1] Sitz A., Schwarz U., et., Estimation of parameters and unobserved components for nonlinear systems from noisy time series, Physical Review. E, Statistical, Vol. 66, No. 1, 2002. [2] Sayed A. A Framework for State-Space Estimation with Uncertain Models, IEEE Transaction on Automatic Control, Vol. 46, No. 7, July 2001. [3] Saab S. A Heuristic Kalman Filter for a Class of Nonlinear System, IEEE Transaction on Automatic Control, Vol. 49, No. 12, December 2004. [4] Leung, H. An Aperiodic Phenomenon of the Extended Kalman Filter in Filtering Noisy Chaotic Signals, IEEE Transaction on Signal Processing, Vol. 48, No. 6, June 2000. [5] Julier S.J., Uhlmann J. K. A New Extension of the Kalman Filter to Nonlinear Systems. In Proc. of AeroSense: The 11th Int. Symp. on Aerospace/Defence Sensing, Simulation and Controls, 1997. [6] S. Lacroix, I-K. Jung, et. The autonomous blimp project of LAAS/CNRS: Current status and research challenges. In 8th International Symposium on Experimental Robotics. Sant’Angelo d’Ischia (Italy), 2002. [7] Chui C.K. Chen G. Kalman filtering with Real-Time Applications, Springer, Germany, 1999. [8] Grewal S.M., Andrews P.A. Kalman filtering, Theory and practice, Prentice Hall, USA, 1993. [9] E. Hygounenc, I.-K. Jung, P. Sou`eres and S. Lacroix. The Autonomous Blimp Project of LAAS-CNRS: Achievements in Flight Control and Terrain Mapping, Special Issue on the 8th International Symposium on Experimental Robotics, 2002. [10] Hygounenc E., Sou`eres P., Dynamnic and aerodynamic modeling of a small size airship, In 5th Workshope on Electronics, Control, Modeling, Measurment and Signals, Toulouse, France, June 2001. [11] Wan E.A., Merwe R. The Unscented Kalman Filter for nonlinear estimation, Oregon graduate Institute of Science and Technology, 2000. [12] Kalman, R.E. A new approach to linear filtering and prediction problem, Transactions of the Asme journal of basic enginering, vol.8, 1960. [13] Schmidt, G.T. Practical Aspects of Kalman Filtering Implementation, AGARD-LS-82, NATO Advisory Group for Aerospace Research and Development, England, 1976. [14] Bryson, A.E. The aerodynamical forces on a slender low (or hight) wing-body-vertical-tail configuration, Journal of Aeronautical Sciences Reader’s Forum, vol. 21, No. 8, pages 574-574, August 1954. [15] Burden, L. R. Numerical Analisys, Edamsa Printers, 2002. [16] Haykin, S. Kalman Filtering and Neural Networks. AGARD-LS82,Wiley-Interscience, 2001. [17] Bertsekas, D.P., Nonlinear programming,Athena Scientific, 1995.

28

IEEE IROS 2005 Workshop on Robot Vision for Space Applications

End-point Control of a Flexible Structure Mounted Manipulator Based on Wavelet Basis Function Networks David X.P. Cheng Defence Research and Development Canada - Suffield PO Box 4000, Medicine Hat, Alberta T1A 8K6, Canada [email protected] Abstract

ent aspects of the M 3 system. Nenchev et al. [1] utilized the reaction null space concept to decompose the joint space of the flexible base and the micro manipulator into orthogonal subspaces such that no vibrations of the flexible base are induced when the end-effector moves along a reactionless reference path. Yoshikawa et al. [2] used the redundancy of the M 3 system to generate joint trajectories that enable the micro manipulator to maximize its ability to compensate for the tracking error resulting from the deformation of the macro manipulator. The dynamics of the system are not considered in this control scheme and therefore it is not suitable for a fast tracking control task. Yoshikawa et al. [3] modified their previous work by taking into account the system dynamics. However, the problem of closed-loop stability and vibration damping in the macro-manipulator were not addressed in either of these papers. Cheng and Patel [4, 5] proposed a control scheme to suppress vibrations in the macro manipulator while achieving stable desired motions of the end-effector of the micro manipulator. A multi-layer perceptron (MLP) network is utilized to estimate the nonlinear dynamic behavior for the M 3 system, and the resulting estimates are used to develop controllers for the macro and micro manipulators without any need for prior knowledge of the dynamic model of the M 3 system. Under the MLP based control scheme, both the tracking error of the end-effector of the micro manipulator and the vibration in the macro manipulator are rapidly suppressed and constrained within an arbitrarily small vicinity of the origin, while the magnitudes of the joint torques are kept bounded. However, the number of weights required to be adjusted in the MLP network may rapidly increase as the size of the network increases.

In this paper, we present a wavelet basis function (WBF) network based control scheme for end-point tracking control of a system consisting of a rigid micro manipulator attached at the end of a flexible macro manipulator. The objective is to suppress vibrations in the macro manipulator and at the same time achieve desired motions of the end-effector of the micro manipulator. A WBF network is utilized to approximate the dynamic behavior of the macromicro manipulator (M 3 ) system in real time, and the controller is developed without any need for prior knowledge of the dynamics. A weight-tuning algorithm for the WBF network is derived using Lyapunov theory. It is shown that both the path tracking error and the damped vibrations are uniformly ultimately bounded under this new control scheme. A case of end-point tracking via kinematic redundancy is studied and the results are compared to those obtained using an MLP network controller and a PD joint controller.

1. Introduction Long-reach manipulators have been proposed for a range of applications that include Space Station maintenance and operation, and nuclear waste disposal. In such applications, the lightweight structure of long-reach manipulators allows the actuators to move faster and carry heavier loads than conventional rigid manipulators. However, the significant structural flexibility makes it difficult to control the position and force at the end-effector accurately and reliably. The incorporation of a small, rigid micro manipulator at the tip of a large, flexible macro manipulator has been proposed as a solution to achieve the desired accurate and robust performance. In order to utilize the macro-micro manipulator (M 3 ) system effectively, one must address the problem of controlling and compensating for vibrations resulting from the flexibility in the macro-manipulator links. Considerable research has been done to address differ-

To reduce the computational complexity while preserving the advantages of the MLP network based control scheme described above, we propose in this paper to use wavelet basis function (WBF) networks for learning the dynamics of the M 3 system and use the end-point feedback from visual tracking for formulating the joint torque command. The closed-loop stability of the control scheme is guaranteed while fast, precise end-point tracking is

29

IEEE IROS 2005 Workshop on Robot Vision for Space Applications

where θ˜M = θM − θMd , δ˜ = δ − δd , θ˜m = θm − θmd , G and Λi , i = 1, 2, 3, are symmetric positive-definite matrices, κ > 0 is a design parameter. The vector r can then be expressed in a matrix form

achieved. The performance of the control scheme is studied on a redundant M 3 system working in a 3-D task space.

2. Control Schemes Based on Wavelet Basis Function Networks

r = Γx˙˜ + Γ˙ x˜ + ΛΓx˜ where Λ = diag{Λ1 , Λ2 , Λ3 } and  IM 0  J J Γ = θM θm 0 0

2.1. Tracking Error Dynamics Macro Manipulator

Micro Manipulator

 κG Jδ  Ie

(9)

(10)

with J˙(·) representing the time derivatives of the Jacobians, and IM and Ie representing the M × M and e × e identity matrices, respectively. Then the dynamics of the M 3 system can be expressed in terms of the filtered tracking error as ˙ = f − τd + τ Mϒ˙r + (Cϒ + M ϒ)r

Figure 1. Schematic of a macro-micro manipulator

where

Consider a system consisting of a macro manipulator with M flexible links and a micro manipulator with m rigid links (Figure 1) with the equations of motion given by M(x)x¨ +C(x, x) ˙ x˙ + G(x) + τd = τ

(1)

(2)

where u can be chosen as  T u , x˜T , x˙˜T , xTd , x˙Td , x¨Td

Let u ∈ Rµ and f ∈ L 2 (Rµ ) on a compact set S ⊂ R µ . By the multi-resolution analysis theory [6], for a given positive number εN > 0, there exist a collection of wavelet basis functions {ϕ j0 ,k : k ∈ K} with a sufficiently small scale j0 such that

and define the tracking error x˜ and the filtered tracking error  T r = r1 T , r2 T , r3 T by h iT x˜ , x − xd = θ˜MT , θ˜mT , δ˜ T (5) r2 , p˙˜ + Λ2 p˜ r , δ˙˜ + Λ δ˜

(7)

3

3

(14)

2.2. Controller Design

(4)

(6)

(13)

Expressing the error dynamics as (11) will enable us to derive a WBF network based control scheme in the next section.

(3)

r1 , θ˙˜M + Λ1θ˜M + κ Gδ˙˜ + Λ1 κ Gδ˜

 −κ G Jθ−1 (Jδ − κ JθM G) (12) m Ie

f (u) = −G − M x¨d −Cxd + Mϒ(2Γ˙ x˜˙ + Γ¨ x+ ˜ ˙ Λ(Γx˜˙ + Γ˙ x)) ˜ +Cϒ(Γ˙ x˜ + ΛΓx) ˜ + M ϒr

where JθM ∈ Rn×M , Jθm ∈ Rn×m and Jδ ∈ Rn×e are Jacobian matrices of p with respect to θM , θm and δ respectively, and θ˜M , θ˜m and δ˜ denote small changes in θM , θm and δ respectively. Denote the desired trajectory as T T xd = [θMd , θmd , δdT ]T

0 −Jθ−1 m 0

and f is an unknown nonlinear function of parameters such as the inertia, centrifugal and Coriolis terms, DH parameters, etc., of the M 3 system, and is given by

where θM and θm are the joint positions of the macro and micro arms, respectively, and δ is the flexural displacement vector of the macro arm. A small variation in the tip position p can be expressed as p˜ = JθM θ˜M + Jθm θ˜m + Jδ δ˜

IM

J ϒ , Γ−1 = −Jθ−1 m θM 0

where M is the inertia matrix, C is the matrix of the Coriolis and centrifugal terms, G is the vector of gravity and elastic torques, τd represents unmodelled dynamics. The position of the end-effector p is given by p = p(θM , θm , δ )



(11)

fi (u) =

∑ ci, j0 ,k ϕ j0 ,k (u) + ε (u),

k∈K

and

kε (u)k < εN ,

i = 1, 2, · · · , n

∀u∈S

(15)

(16)

where ε (u) is the reconstruction error and

ϕ j0 ,k (u) = 2− j0 /2 ϕ (2− j0 u − k) ci, j0 ,k = h fi , ϕ j0 ,k i

(8)

30

(17) (18)

IEEE IROS 2005 Workshop on Robot Vision for Space Applications

tracking control of the closed-loop flexible M 3 system. The algorithm is determined by the following equation:

Let W = {ci, j0 ,k } and ϕ = [ϕ j0 ,k1 , ϕ j0 ,k2 , · · · , ϕ j0 ,kN ]T , where k1 , k2 , · · · , kN are the translation vectors in the index set K. We can then express (15) in a matrix form: T

f (u) = W ϕ (u) + ε

 ˆ W˙ˆ = P ϕ · r¯T − µ k¯rkW

(19)

where P = PT is any constant symmetric positive-definite matrix, µ > 0 is a small scalar design parameter, and ϒ is a nonlinear transformation given by (12). A block diagram of the M 3 control system is shown in Figure 2.

We assume that on any compact subset of Rn , the weighting matrix is bounded, i.e., kW kF ≤ bW

(20)

where k · kF denotes the Frobenius norm. The approximation given by (19) can be implemented by a wavelet basis function (WBF) network. The weights in the input layer of the WBF network are identically set to 1 while the weights in the output layer represent the weighting coefficients of the basis functions ϕ j0 ,k in the wavelet expansion of f (u) given by (15). Let an estimate of the nonlinear function f (u) defined in (13) using a WBF network be given by fˆ(u) = Wˆ T ϕ (u)

D =C 7B8 < :A =C ; 2 4>13 5 = 7?8 =< 8 7?

r¯ = [¯r1T , r¯2T , r¯3T ]T = ϒr,

D

   

H

E

D

D            H

JIG E

       

 F

JI

K H

465 L

F G E

!" #$&%' !( #$&% !"&) ( * +, " - % $ . / .&- 0 !

     

@

(21)

JI G

7? 8 :> =< 0 and kξ > 0 are positive design constants, ϒ is a nonlinear transformation given by (12), and ξi , i = 1, 2, 3, are robustifying components incorporated to compensate for the reconstruction errors of the neural network and the unmodelled dynamics.

(27)

where tr{·} denotes the trace of a square matrix. Differentiating V along the solution of the error dynamics system yields 1 ˜ T P−1W˙˜ } V˙ = rT ϒT (M˙ − 2C)ϒr + rT ϒT ( f + τ − τd ) + tr{W 2 (28)

2.3. Weight Tuning Algorithm

Using r¯ = ϒr and the skew-symmetric property of M˙ − 2C, we obtain

Based on the control scheme given by (22) through (25), a weight tuning algorithm for the WBF network can be designed as a natural extension of the weight tuning algorithm for the MLP network developed in [4] for achieving stable

V˙ = r¯1T ( f1 + τM ) + r¯2T ( f2 + τm ) + r¯3T f3 − r¯T τd ˜ T P−1W˜˙ } + tr{W

31

(29)

IEEE IROS 2005 Workshop on Robot Vision for Space Applications

or

Substituting the control laws (22) and (23) into the above equation, we obtain  r¯1 r¯3T ˆ3 − ξ3 + K3 r¯3 ) − K1 r¯1 ( f k¯r1 k2 + ε ˜ T P−1W˙˜ } + r¯3T f3 + r¯2T ( f2 − fˆ2 + ξ2 − K2 r¯2 ) − r¯T τd + tr{W

 V˙ = r¯1T f1 − fˆ1 + ξ1 −

˜ T R−1W˙˜ } ≤ −¯rT K r¯ + r¯T f˜ + r¯T ξ − r¯T τd + tr{W

(30)

where K = diag{K1, K2 , K3 }, f˜ = f − fˆ, and ξ =  T T T T ξ1 , ξ2 , ξ3 . From (19) and (21), the second term on the right-hand side of (30) can be expressed as  ˆ Tϕ +ε r¯T f˜ = r¯T W T ϕ − W (31) = r¯T W˜ T ϕ + r¯T ε

= −kξ k¯rk

r¯ k¯rk

(32)

˙˜ } = −tr{W ˜ T P−1 W ˜ T P−1 W˙ˆ } tr{W  T −1  ˆ = −tr W˜ P P ϕ · r¯T − µ k¯rk W

(33)

Substituting (31), (32) and (33) into (30) we obtain V˙ ≤ −¯rT K r¯ + r¯T W˜ T ϕ + r¯T ε − kξ k¯rk − r¯T τd ˜ T ϕ + µ k¯rk tr{W ˜ T Wˆ } − r¯T W

˜ T Wˆ } ≤ −¯rT K r¯ − kξ k¯rk + r¯T (ε − τd ) + µ k¯rk tr{W ˜ kF − kW ˜ k2F ) ≤ −¯rT K r¯ − kξ k¯rk + r¯T (ε − τd ) + µ k¯rk (bW kW n o ˜ k2F − bW kW ˜ kF ) + ε N ≤ −k¯rk λmin (K)k¯rk + µ (kW (34)

where we have applied the known condition kξ > bd and ˜ TW ˆ } ≤ bW kW ˜ kF − kW ˜ k2F [4] to augthe inequality tr{W ment the right-hand side of (34). Completing the squares results in n o 2 ˜ F − bW )2 − µ bW + εN V˙ ≤ −k¯rk λmin (K)k¯r k + µ (kWk 2 4 (35) Therefore, V˙ < 0 if k¯rk >

 2 1  µ bW + εN , br¯ λmin (K) 4

2 bW εN + , bW˜ 4 µ

(37)

Here br and bW˜ denote the regions of convergence for the filtered tracking error and the weight estimation error in the controller, respectively. V˙ becomes negative and V decreases outside the compact set defined by krk ≤ br and ˜ kF ≤ bW˜ . According to the LaSalle extension of LyakW punov analysis, this concludes that both r and W˜ are uniformly ultimately bounded (UUB).

To demonstrate the effectiveness of the proposed control scheme, we have applied it to a M 3 system with one flexible boom and four rigid micro-arms. The flexible boom has a thickness of 1.3 mm, width of 3.14 cm, length of 1 m, mass of 1 kg and flexural rigidity of 20 Nm2 . The first and second rigid links have equal lengths of 0.1 m and equal masses of 0.1 kg, and the third and fourth have equal lengths of 0.2 m and equal masses of 0.2 kg. The assumed-modes method with clamped-mass boundary conditions was used to model the M 3 system. Two orthonormal mode shapes were taken into account for simplifying the inertia and stiffness matrices. The resulting natural frequencies of vibration are 2.50 Hz and 15.6 Hz. The redundancy of the micro manipulator was fully used to suppress vibrations in the flexible boom. All the joints of the M 3 system were controlled to track the end-point trajectory given by

From (26), the last term on the right-hand side of (30) can be expressed as

˜ T Wˆ } ˜ T ϕ + µ k¯rk tr{W = −¯rT W

s

3. End-point Tracking by Kinematic Redundancy

From (24), the third term on the right-hand side of (30) can be expressed as r¯T ξ = −¯rT · kξ

bW kW˜ kF > + 2

xd = 0.3 cos(2t) + 0.80 √ √ 3 t + 0.5 3 yd = 25π zd = 0.1 sin(2t)

(38)

which is a helix with a radius of 0.2 m. The desired joint velocity θ˙d is given by

θ˙d = J + p˙d + (I − J +J)η ,

(39)

where J + is the pseudo-inverse of [JθM , Jθm ] and η is an arbitrary 4 × 1 vector. The desired joint trajectory θd is obtained by integrating the above equation. A WBF network was designed to approximate the unknown nonlinear function f . The inputs to the network  T were taken as u = x˜T , x˙˜T , xTd , x˙Td , x¨Td Twenty-five hidden units were used in the network with the activation wavelets selected to be the radial Mexican hat functions given by

(36)

ϕ (u) = (µ − uT u)e−u

32

T u/2

,

u ∈ Rµ

(40)

IEEE IROS 2005 Workshop on Robot Vision for Space Applications

The WBF network has a total of 150 weights which require no prior knowledge of the parameters either of the M 3 system dynamics, and were simply initialized at zero and updated on-line by the weight tuning algorithm given by (26).

Table 1. Comparison of the WBF and MLP controllers (nh =number of hidden units, nW =number of weights, nI =number of integrators, kek=average 2norm of tracking error) Item

WBF network

MLP network

nh

25

10

nW

150

376

nI

150

376

kek

8.532 × 10−3

7.296 × 10−3

shutting down the WBF network in the outer loop of the system and using the state position and velocity feedback in the inner loop. It is shown that the tracking error is significantly reduced under the proposed WBF network based control scheme. This is because the WBF network controller is capable of suppressing the vibrations in the flexible boom induced by acceleration of the micro manipulator during the end-point tracking.

Figure 3. Arm motions of the M 3 system with kinematic redundancy (t=3.25-6.25 sec) Figure 3 shows the sequences of arm motions controlled by the WBF network for the end-effector of the M 3 system to track the desired helix in 3-D task space. Figures 4 to 6 compare the results of the tracking control of the M 3 system using the WBF network controller against those using an MLP controller [4]. We can see from Figures 4 and 5 that the amount of vibration in the micro arm was effectively constrained in small regions around the origin by the WBF as well as the MLP controller, with the WBF controller being slightly more efficient than the MLP controller in suppressing the vibrations. Figure 6 shows the 2-norms of the task space tracking errors. We can observe that stable and robust tracking were achieved by the MLP and WBF controllers. In this test case, the overall performance of the WBF controller is comparable to that of the MLP controller in the sense of tracking accuracy. Further comparison of the WBF and MLP controllers is provided in Table 1, where we compare the WBF controller with the MLP controller in terms of computational complexity. It can be seen from Table 1 that the total number (150) of weights required to be tuned in the WBF network is much less than that (376) required by the MLP network, which makes the WBF controller faster in learning the dynamics of the M 3 system. Figure 7 compares the tracking errors in task space using the WBF network based controller and a PD controller. The PD controller was implemented by

Figure 4. Flexural deflections (1st mode) of the macro manipulator (MLP: —, WBF: - -)

4. Conclusions In this paper, we present a novel control scheme for time-varying end-point tracking control of a macro-micro manipulator (M 3 ) system based on wavelet basis function

33

IEEE IROS 2005 Workshop on Robot Vision for Space Applications

Figure 7. Cartesian tracking errors and error norms (WBF network: —, PD: - -)

Figure 5. Flexural deflections (2nd mode) of the macro manipulator (MLP network: —, WBF: - -)

compared to that of an MLP network controller and a PD joint controller on a four-link rigid micro manipulator attached at the tip of a flexible structure. It is shown that the WBF network controller significantly improves the tracking performance over the PD controller, and it is capable of achieving comparable performance to that of the MLP controller while significantly reducing the amount of online computation for updating the weights of the network.

2−norm of Cartesian tracking errors

0.015

norm (m)

0.01

References

0.005

0

0

1

2

3

time (sec)

4

5

6

[1] D.N. Nenchev, K. Yoshida, P. Vichitkulsawat, and M. Uchiyama, “Reaction null-space control of flexible structure mounted manipulator systems,” IEEE Transations on Robotics and Automation, vol. 15, pp. 1011–1023, 1999. [2] T. Yoshikawa, K. Hosoda, T. Doi, and H. Murakami, “Quasistatic trajectory tracking control of flexible manipulator by a macro-micro manipulator system,” in Proceedings of the IEEE International Conference on Robotics and Automation, 1993, pp. 210–215. [3] T. Yoshikawa, K. Hosoda, T. Doi, and H. Murakami, “Dynamic trajectory tracking control of flexible manipulator by a macro-micro manipulator system,” in Proc. IEEE Int. Conf. Robotics and Automation, San Diego, CA, June 1994, pp. 1804–1809. [4] D.X.P. Cheng and R.V. Patel, “Neural network based tracking control of a flexible macro-micro manipulator system,” Neural Networks, vol. 16, pp. 271–286, 2003. [5] D.X.P. Cheng and R.V. Patel, “Vibration control of a flexible macro-micro manipulator system using neural networks,” in Proc. 15th IFAC World Congress, Barcelona, Spain, 2002. [6] S.G. Mallat, A Wavelet Tour of Signal Processing, Academic Press, New York, NY, 2nd edition, 1999.

7

Figure 6. Cartesian tracking errors and error norms (MLP: —, WBF: - -) (WBF) networks. The control scheme allows us to constrain the tracking errors of the micro manipulator in the presence of vibrations due to the flexibility of the macro manipulator links within an arbitrarily small region around the origin by applying bounded control torques at the joints of the M 3 system. The WBF network is designed to perform the learning and control tasks online simultaneously and no off-line training procedure is required for the neural network to identify the dynamic model of the M 3 system. The stability and convergence properties of the control scheme provide assurances of the reliability needed to make the controller feasible in practical real-time control. The performance of the control scheme is tested and

34

IEEE IROS 2005 Workshop on Robot Vision for Space Applications

ARTag Fiducial Marker System Applied to Vision Based Spacecraft Docking Mark Fiala National Research Council of Canada Bldg M-50 1200 Montreal RD, Ottawa, Canada K1A-0R6 Email: [email protected]

spacecraft and their imagery can be used for pose determination. Computer vision relies on being able to automatically recognize and track features visible in the camera image. Computer vision has the advantage of being able to track position of features with a high degree of accuracy compared to human vision, however automated methods often fall short in the task of recognition. Features such as a circular white dot on a black background can look similar to many other round features that are not the intended feature. System integrators, with good reason, have been slow to adopt computer vision in many critical systems due to issues of robustness. Fiducial Marker Systems are created to address this issue for a whole host of robotics, augmented reality, photo-modeling and other applications where it is important to make it possible to find correspondences in an image with points in 3D space. ARTag is a recently designed fiducial marker system designed to provide a high degree of reliability and functionality in uncontrolled lighting situations. Like other fiducial marker systems, ARTag consists of a library of 2D patterns which are detected in digitized camera images. ARTag markers were specially designed for a vanishingly low false detection rate (when a marker is falsely reported when no real marker is present) and a very low inter-marker confusion rate (where one marker is mistaken for another). These markers, very robust on their own, are combined into arrays to provide yet higher confidence and to span a larger space providing a more accurate pose determination. Since there are up to 2002 possible ARTag markers, they do not need to be repeated, and many can be placed around a spacecraft with varying sizes to permit detection at far and close distances.

Abstract— Computer vision systems can be used for spacecraft docking where one or more cameras on one craft detect visual features on the other. Such a system can be used in the final tens or hundreds of metres of a docking procedure. Relative pose can be determined accurately and inexpensively using machine vision. Fiducial marker patterns can be used to robustify the vision processing to provide a high reliability and confidence in the machine vision’s result. The ARTag fiducial marker system encodes digital codes in 2-D marker patterns, the ARTag system was designed to maximize and quantify reliability of detection. ARTag markers are used in a multi-scale array to facilitate both a large distance range and enhanced reliability. The relative pose between the two spacecraft is determined automatically. A realtime prototype is shown.

I. I NTRODUCTION The final stages of docking two spacecraft can be accomplished using Fiducial Markers mounted on one craft and a camera (or cameras) mounted on the other. Automatic processing of this image (known as computer vision or machine vision) can provide the relative pose (position and orientation) between the two craft. Examples include the space shuttle docking with the international space station, or a robotic service craft docking with a satellite to repair or alter its orbit. The fiducial markers would be mounted on the space station or satellite at different locations and sizes so that several are always in view as the docking craft approaches.

II. F IDUCIAL M ARKER S YSTEMS Fiducial marker systems consist of the printed patterns and the computer vision algorithms to identify them in an image. Some numerical metrics for appraising marker systems are; 1) the false positive rate, 2) the inter-marker confusion rate, 3) the false negative rate, 4) the minimal marker size, 5) the vertex jitter characteristics, 6) the marker library size, and 7) the speed performance. The false positive rate is the rate of falsely reporting the presence of a marker when none is present. The inter-marker confusion rate is the rate of when a marker is detected, but the wrong id was given, i.e. one marker was mistaken for another. The false negative rate is the probability

Fig. 1. ARTag fiducial markers proposed mounting on International Space Station (ISS) to facilitate automatic docking. (Left) close-up of ISS docking module. (Right) Large markers placed elsewhere on unused surfaces for detection at greater distances.

Computer vision systems are a natural choice for docking sytems since video cameras can be low cost and provide high angular accuracy. These cameras are also present on manned 35

IEEE IROS 2005 Workshop on Robot Vision for Space Applications

that a marker is present in an image but not reported. Another metric is 4) the minimal marker size which is the size in pixels required for reliable detection. The smaller the marker needs to be in the image, the larger the usable range from the camera the marker system can be used at. The library size is the number of unique markers that the system can support. Speed performance is a consideration, a vision based fiducial marker tracking system must be real time with available computing power. Two other factors of fiducial marker system performance, not quantitatively measured herein but described loosely, are the immunity to lighting conditions and occlusion. Occlusion is when a fiducial marker is partially or completely blocked from view by the camera by some other object, a marker system should be able to still detect a marker despite partial occlusion. Space provides extreme lighting variation and a marker system destined for space docking should be able to detect markers with widely varying lighting. There are several fiducial marker systems available, several are shown below in Fig. 2, of which ARTag is the system used in this paper. Most use a two step process of first locating an identifying feature, followed by a verification and identification step. Most use a quadrilateral boundary as an identifying feature, with the interior containing the identification and verification information. The four corners can be used to create a homography to sample the interior in a perspectively correct manner. ARToolkit uses correlation methods to attempt to identify the interior (making it sensitive to camera properties and lighting) while most others attempt a symbolic encoding method with the interior divided up into several cells which are coloured white or black. The rotation of the quadrilateral markers must be found in order to correctly interpret the interior, Canon’s marker [5] and Binary Square Marker [3] use the four corner cells in their 3x3 or 4x4 cells respectively leaving 5 or 12 digital bits left to encode identity. Cybercode [12] uses a black bar to locate one side, and can be extended to an arbitrary size but does not provide information that allows it to be detected under perspective distortion. The ARVIKA markers (SCR, IGD, HOM) 1 use a 4x4 array, information is not available for these proprietary markers but it is clear that they can store either 16 or 20 bits. The ARTag marker system [6], [8] also has a quadrilateral border but has 36 internal cells, how these are used to achieve high reliability is mentioned in the next section. ARToolkit Plus 2 is a new system, inspired by the high reliability of ARTag, but not containing all ARTag’s features such as edge-based detection and Hamming distance analysis. ”Blob” analysis can be used instead of quadrilateral outlines, connectivity is performed on a binarized image to identify adjacent and enclosed regions. Several commercially available circular fiducial marker systems exist using a single broken annular ring around a circular dot such as Photomodeler’s ”Coded Marker Module” 3 . Naimark and Foxlin [10] describe

a system extending the number of rings carrying 15 bits of digital information. Bencina’s ”reacTIVision” marker [2] creates a tree of topology of what blobs are contained inside other blobs, trees matching a set library are used to find markers.

Fig. 2.

Most of these marker systems are not available for evaluation, due to non-disclosure or not being a fully developed system (many are only created for an application in an academic paper or two and discarded). There are several comparisons for the available and developed systems. Zhang [13] and Claus [4] perform a survey of several fiducial marker systems (ARVIKA markers and ARToolkit) with respect to processing time, identification, image position accuracy with respect to viewing angle and distance. ARToolkit is popular because it is simple, relatively robust, and freely available and is used in many augmented reality applications. ARTag and ARToolkit are compared in [9]. How the unique features are found influences the reliability of the system with respect to the false negative, ideally as low as possible meaning that the detection stage does not often miss a marker when it is in view. Most of the above systems use binary morphology to find blobs and quadrilateral contours, this relies on a greyscale threshold operation to divide white and black pixels. This thresholding can use a single greyscale value, however a single value will often not be able to separate all white from all black pixels in an image, often the white area in one area of the image is darker than the black in another or vice versa. Automatic thresholding, either on the neighborhood around a pixel, or a histogram of the entire image is often applied. However, automatic thresholding only works in some cases, and the aperture problem of how large a neighborhood window to use causes problems and artifacts. An edge-based method to find polygon contours solves this problem in most cases and provides a substantial improvement in lighting performance and allows the use of simple heuristics to close borders broken by occlusion or markers extending beyond the image boundary. It appears ARTag is the only system to date to use edge based methods for the unique feature location step. There are other visual markers used in industry to convey information, some of them are included in this discussion for completeness, however they are not suitable for fiducial markers as will be explained. The purpose is to carry information,

1 http://www.arvika.de/www/index.htm 2 http://studierstube.org/handheld

Fiducial marker systems.

ar/artoolkitplus.php

3 http://www.photomodeler.com

36

IEEE IROS 2005 Workshop on Robot Vision for Space Applications

Digital processing is performed on the 36 bits to recognize or reject the quadrilateral as an ARTag marker, and to report its ID if recognized. Space applications can provide uncontrolled lighting with large variations of illumination of markers between and within an image frame. ARTag’s quadrilateral detection using edges allow it to still function as long as the camera’s shutter and dynamic range provide digital data where there is some difference between white and black (at least 5 grey levels out of 256). Edge pixels are thresholded and linked into segments, which are in turn grouped into ”quads”. The four corners of each quad boundary are used to create a homography mapping to sample the marker’s interior. Once quadrilateral border contours have been located, the internal region is sampled with a 6 x 6 grid and assigned digital symbols ’0’ or ’1’. The 36-bit binary sequence encoded in the marker encapsulates a 10-bit ID using digital methods. The extra 26 bits provide redundancy to reduce the chances of false detection and identification, and to provide uniqueness over the four possible rotations. The system was designed as if it was a communication system, with the marker creation as the transmission, with the lighting, occlusion, perspective distortion, noise constituting the communications medium, and the real-time marker decoding as the receiver. The Cyclical Redundancy Check (CRC) and forward error correction (FEC) are digital methods used to identify if the 36-bit code is part of the ARTag marker set, and to extract its ID. The 36 data bits from the marker cells were provisioned into 10 for the payload ID, 16 for a CRC-16 checksum, and the remaining 10 bits for the FEC which can correct 2 bit errors. The 10 payload bits do not reside in specific bits, but are convolved and the ID and redundancy spread among all bits. The set of 1024 possible marker interiors is reduced to 1001 to improve the Hamming distance between markers. ARTag marker corners are located with sub-pixel precision, with a typical jitter of about 0.05 pixels, and a minimum marker size of 15-17 pixels with a focused camera. ARTag markers are detected individually and combined in the system proposed herein, the docking pose for an image frame is calculated by considering all the markers detected in a frame as described in Section III.

not to identify in a camera under perspective distortion as is needed for fiducial marker systems. Five systems are shown below in Fig. 3.

Fig. 3.

1-D and 2-D pattern systems using in industry.

ARTag was inspired by ARToolkit and Datamatrix, and was designed attempting to capture the best qualities of both for the purpose of creating a small but highly reliable fiducial. Datamatrix, Maxicode and QR all have a common thread of encoding data using binary values for reflectance, the pattern is typically bitonal reducing the decision made per pixel to a threshold decision. This reduces the lighting and camera sensitivity requirement and removes need for photometric calibration and linearization of the signal (i.e. no attempts are made to identify shades of grey). ARTag was chosen by criteria described in section II-B. A. ARTag ARTag is a planar fiducial marker system consisting of either 1001 or 2002 possible markers (depending if both blackon-white and white-on-black border polarities are used), each marker has a square border and an interior containing 36 cells which encode a 10-bit marker ID with error detection and correction. The markers (Fig. 4) are bitonal, thus not requiring photometric camera calibration, and the square border unique feature is found by an edge-based method to handle lighting variation and occlusion. In the ARTag module itself, each marker is independently found, with no history of previous image frames. Due to the ID being a result of an algorithm, no pre-stored patterns such as ARToolkit’s pattern files are required.

B. Comparing Marker Systems A space vision docking system must have a very high reliability, it should function in adverse lighting conditions and should have a vanishingly low rate (ideally zero) rate of providing a false inter-craft pose estimate. The false-positive and inter-marker confusion rates describe how outlier detections can provide a dangerous false positive pose estimate, the false-negative rate affects how often the system will simply not be able to report a pose which could potentially be as deadly if too much time goes by with no pose estimate provided. The false-positive detection rate is critical, a marker system should have a very low and calculable rate of reporting a fiducial marker when none is present. With the ARToolkit correlation based approach, these false detections can be quite

Fig. 4. 12 examples of ARTag markers out of the library of 2002 available markers (or only 1001 if only one polarity of background colour is used). ARTag markers are 10 units wide, with the center 6x6 cells carrying 10 bits of ID.

The whole marker is 10 x 10 units, with a border of thickness 2 units leaving 36 cells in the interior to carry information. Each cell is only black or white and carries one bit of digital data. Thus a 36-bit word can be extracted from a camera image of the marker once the boundary is determined. 37

IEEE IROS 2005 Workshop on Robot Vision for Space Applications

inter-marker confusion rate, confusing one marker for another creates outliers in image to 3D-point correspondences which are likely as problematic as false positives. The ARToolkit Plus system simply uses the ID repeated four times (with an XOR mask), which results in markers with sequential ID’s having a Hamming distance of only 4 between them. Hamming distances are a measure of how different digital codes are, they are simply the sum of different bits, and can be used to calculate one component of the inter-marker confusion rate (described in [6], [8]). Many pairs of ARToolkit Plus’s markers have a Hamming distance of only 4 between them, resulting in a not unlikely probability of markers being mistaken one for another. The ARTag system addressed this concern directly in the marker system design and has greater Hamming distances between all markers in the library, taking rotation and reflections into account. Considering digital coding factors, ARTag was selected for its low false positive and inter-marker confusion rates, as well as its ability to reduce the false negative rate due to bit errors and its large library size. Also, its edge based method provides a much needed immunity from lighting variation. The above analysis has focused on single marker detection, the individual markers are now combined into arrays.

common. With the digital code based systems the number of data bits provisioned for error checking determine the possible immunity to false detection. For every quadrilateral found in the image, assuming a high entropy interior pattern the false positive rate can be calculated as  where is the number of checksum bits. For example, with the 4x4 systems (SCR,  bits are allocated for a checksum the HOM, IGD), if system will have an estimated false positive rate of . Matrix [11] (Fig. 2) has 25 bits with 9 bits or less for  equals checksum giving a similar result. ARTag’s a false positive rate of , if ARTag had no FEC it would have a false positive of  for  . ARToolkit Plus [1] has no checksum but checks for the 9bit payload ID to be repeated 4 times. A low false positive rate of  . Whatever marker system is used, the combination of several markers into a composite pattern reduces the final false positive result, the odds of false positives combining to match the geometric arrangement of a composite pattern reduces the false positive to below that of a single marker. The false negative rate is also important, the docking system should provide a pose estimate for every image frame. Error correction schemes allow the marker detection process to still recognize a marker, despite ARToolkit Plus’s very low false positive rate, it has no error correction and will not detect a marker as often as ARTag despite having the same capacity of 36 bits. The Binary Square Marker system [3] is the only marker system which is known to use error correction other than ARTag. Binary Square Marker can correct 2 bit errors using parity error correction, however using this brings the false positive detection rate to almost . The library size is also an issue, the marker system should support a large number of markers so that they can be placed all over the spacecraft(s) without needing to repeat markers. ARToolkit Plus provides a library size of 512 markers, Binary Square Marker provides 2048, and ARTag provides up to 2002. The false positive, false negative rates, and library size are at odds with one another, and represent a trade-off between missing a marker, seeing a non-existent one, and how many different markers are available. These three parameters depend on how many bits are provisioned for error checking, error correction, and number of ID bits respectively. Because of the rotational symmetry of the quadrilateral markers described, several systems not using checksums allocate 4 bits (usually the corner cells) to determine orientation (Binary Square Marker and the Canon system are known to do this). However, this introduces a special vulnerability to these 4 cells, if even one is corrupted the orientation is unknown. Clearly the larger the number of data bits/cells, the greater the possibility for implementing error detection and correction, with only ARTag performing both these functions. This would suggest making the marker larger, however this increases the minimum size in pixels that the marker needs. Keeping the marker small is important to retain the highest possible range of distances where detection can take place. A factor of great importance not yet mentioned is the

III. P ROPOSED AUTOMATIC D OCKING S YSTEM A robust pose estimation can be determined by computer vision algorithms processing imagery from one or more cameras on one of the spacecraft viewing markers mounted on the other craft. The 6-DOF (degree of freedom) pose can be converted by a simple position offset and rotation to determine the pose error between the incoming craft and where it would be if correctly docked. ARTag markers can be mounted in unused spots on the first spacecraft (space station, satellite, etc) with different sizes to allow detection over a long range from far away to very close. The markers would be spread out so that when the docking craft is near, markers extend across the field of view to give good pose accuracy. For each marker detected, it is matched to its known 2D or 3D location known a priori, providing 4 point correspondences between the image and the world. Let

   be the markers corner point if planar, or      be a point in 3-space if the markers are not co-planar, and    be its image location adjusted for non-ideal lens parameters (most commonly radial and thin prism distortion). In the coplanar case  is related to the world coordinates by a homography  (Eqn. 1), whereas in the non-coplanar case  is related to the world coordinates  by a projection matrix    .

    

        

    

  



38

(1) (2) (3) (4)

IEEE IROS 2005 Workshop on Robot Vision for Space Applications















        

system was measured in [8] to be detectable down to about 15 pixels with the same camera. The camera used in both cases was a greyscale digital video camera of resolution 640x480 pixels and a 8mm lens providing a focal length of 1100 pixels 4 .

(5) (6) (7)

Each adjusted point provides two equations in a linear expansion of  or . If matrix  is rewritten as a 9-dimensional vector

           the problem can be expressed as finding to satisfy Eqn. 4, where is a matrix with  rows and  columns. is the number of marker corners matched between the image and a priori model, is a multiple of 4 since every ARTag marker provides 4 correspondence points. Each correspondence point consisting of     and     creates two rows in (Eqn. 5). Eqn. 4 is solved by the method of least squares by applying singular value decomposition (SVD) to . Similarly,  is rewritten as an 11-dimensional vector

to solve Eqn. 6 by SVD on . Two rows in are

   and filled (Eqn. 7) by each correspondence       . Thus from a set of image-to-2D or image-to-3D point correspondences, a matrix  or  can be found for that image frame. If the camera parameters () are known a priori, as is assumed in a spacecraft, the inverse matrix  can be premultiplied by  or  to extract pose   (rotation matrix , translation vector ),  (Eqn. 3) contains the focal lengths and  , skew factor , and principal point   . As with camera calibration algorithms, the pose could be adjusted by a few non-linear iterations that attempt to minimize the reprojection error. Good results were obtained without this stage, but it may be desirable for a real system. A system can be based solely on a planar array, or could be a hybrid system where the pose is determined by either by the homography or projection matrix calculation. A decision criteria has to decide which to of the two to use, a possible scheme is to decide if the 3D  points of the correspondence set found in an image can be fit to a plane within a threshold. This could be done by eigen-plane methods. This warrants further investigation as that there may be issues of pose consistency when switching between planar and 3D pose estimation. Two prototype systems are shown, one of a planar array (Fig. 5) using Eqn. 4 to determine pose and a 3D arrangement of markers (Eqn. 6). The planar docking system was created using a 43 cm wide array and allowed pose extraction in a range of 10cm to 7 metres, the largest markers were 14 cm across which corresponds to a marker width of 22 pixels. A 3D system was made with a satellite mock-up 25 cm wide with markers placed around the exterior. The marker positions were automatically determined with bundle adjustment from a set of images (see [7]), however this was done for convenience and in likely in a space system these would be measured accurately instead. The ”satellite’s” pose was detected up to 4 metres away, the largest markers were 7 cm across corresponding to a marker width of 19.5 pixels in the image. The ARTag marker

Fig. 5. Prototype coplanar array of ARTag markers, the detected corners for each marker in a camera image are combined with the known spatial coordinates of the marker to calculate a homography and determine pose.

The software ran with a frame rate of greater than 10 fps on a 3.0 GHz Pentium 5 PC. The not quite real-time speed performance is partly due to the non-ideal software and device driver infrastructure, and could easily run at 60 or 100 fps with the ARTag algorithm optimized into a DSP or FPGA. Some screen shots are shown below in Figs. 7, the planar system has the 6-DOF information superimposed. In a real system the full 6-DOF pose would be used to control the incoming craft, but for visualization this prototype system provides a control signal of the greatest error component. To demonstrate the system, a user can watch these control signals (arrow at middle bottom of screen shots) alone and guide the camera to the center point of the array. As mentioned, in a real system the ARTag markers would be mounted all over at least one of the two spacecraft, in ad hoc non-coplanar positions. The 3D position of the markers can be measured by taking many images of the space station or satellite and automatically creating the 3D coordinates of each marker corner. This was demonstrated for a non-space application with many ARTag markers placed on an irregular curved object in [7]. In general, the more cameras that are used, the lower the undesirable probability that many false negatives can create moments when no pose information is available (most likely caused by a camera being over-saturated or recovering from over-saturation such as seeing the sun). The marker’s corner 4 IEEE-1394

39

Dragonfly Camera from Point Grey Research

IEEE IROS 2005 Workshop on Robot Vision for Space Applications

results would be combined from all the cameras in a single pose determination. Also the system can be enhanced by placing cameras and markers on both spacecraft, and with communication between the crafts an even more reliable and accurate pose could be determined. IV. C ONCLUSIONS The pose determination for an automatic computer vision based spacecraft docking system was proposed, and a prototype system demonstrated. The system uses fiducial marker arrays mounted on surfaces on one craft viewed by a camera (or cameras) on the other craft. Such a system can be used in the final tens or hundreds of metres of a docking procedure after RADAR and LIDAR systems have brought the spacecrafts within a hundred metres of each other. Computer based fiducial marker systems were described and all known such systems described and compared. The ARTag fiducial marker system was chosen for its low false positive, false inter-marker confusion, and false negative rates as well as its immunity to lighting variation and partial occlusion. Overall, a robust and fully automatic pose determination system using low cost and accurate computer vision algorithms was created addressing the needs of very high reliability for space applications. R EFERENCES [1] http://studierstube.org/handheld ar/artoolkitplus.php (artoolkit plus webpage). [2] R. Bencina, M. Kaltenbrunner, and S. Jorda. Improved topological fiducial tracking in the reactivision system. In Proceedings of the IEEE International Workshop on Projector-Camera Systems (Procams 2005), San Diego, USA, 2005. [3] P. Boulanger. Application of augmented reality to industrial tele-training. In Proc. of CRV’04 (Canadian Confernence on Computer and Robot Vision, pages 320–328, May 2004. [4] D. Claus and Fitzgibbon. Reliable fiducial detection in natural scenes. In Proceedings of the 8th European Conference on Computer Vision, Prague, Czech Republic, May 2004. [5] H. Yamamoto D. Kotake, S. Uchiyama. A marker calibration method utilizing a priori knowledge on marker arrangement, Nov. 2004. [6] M. Fiala. Artag, a fiducial marker system using digital techniques. In CVPR’05, pages 590 – 596. [7] M. Fiala. The squash 1000 tangible user interface system. In 2005 IEEE / ACM International Symposium on Mixed and Augmented Reality (ISMAR 2005), Vienna, Austria. [8] M. Fiala. Artag revision 1, a fiducial marker system using digital techniques. In National Research Council Publication 47419/ERB-1117, November 2004. [9] M. Fiala. Fiducial marker systems for augmented reality: Comparison between artag and artoolkit. In MIRAGE 2005, INRIA Rocquencourt, France, Mar 2005. [10] L. Naimark and E. Foxlin. Circular data matrix fiducial system and robust image processing for a wearable vision-inertial self-tracker. In ISMAR 2002: IEEE / ACM International Symposium on Mixed and Augmented Reality, Darmstadt,Germany, Sept. 2002. [11] J. Rekimoto. Matrix: A realtime object identification and registration method for augmented reality. In Proceedings of 3rd Asia Pacific Computer Human Interaction. (APCHI ’98), pages 63–68, 1998. [12] J. Rekimoto and Y. Ayatsuka. Cybercode: Designing augmented reality environments with visual tags. In Proceedings of DARE 2000 on Designing augmented reality environments, pages 1–10, Elsinore, Denmark, Oct 2000. [13] X. Zhang, S. Fronz, and N. Navab. Visual marker detection and decoding in ar sytems: A comparative study. In IEEE and ACM International Symposium on Mixed and Augmented Reality (ISMAR), pages 97–106, Sep 2002.

Fig. 6. (Top and Bottom) ARTag marker array detected at 6.7 and 0.1 metres respectively.

Fig. 7. Satellite mock-up. 3D arrangement of markers, pose is found by first calculating a projection matrix.

40

IEEE IROS 2005 Workshop on Robot Vision for Space Applications

End Notes and Survey

Current and future challenges in space robotics and vision Leo Hartman and Ioannis Rekleitis Canadian Space Agency [email protected], [email protected] 6767 Airport Rd., St-Hubert QC J3Y 8Y9 Canada www.space.gc.ca

Abstract The Canadian Space Agency is involved in several robotics activities including on orbit servicing and assembly and planetary exploration. With Nasa’s Vision for Exploration likely to dominate the global space sector for the coming decades, the importance of and our dependence on robotics, and therefore, vision and other wide area sensing technologies, for reliable and lower cost missions will only increase. This presentation will provide a short overview of CSA’s current robotic challenges and, by way of summarizing the workshop, the possible challenges and opportunities of the future.

Biographies Leo Hartman received his PhD in Computer Science from the University of Rochester in 1990. His dissertation investigates the use of decision theory for allocating computational resources in deductive planning. After a post-doc at the University of Waterloo, he joined the Canadian Space Agency in 1993 as a research scientist. His work there focuses on computing, networking and automation for space missions. Ioannis Rekleitis is currently a visiting fellow at the Canadian Space Agency and adjunct Professor at the School of Computer Science, McGill University. His work is focusing on developing software systems for autonomous planetary exploration and On-Orbit servicing. He also works on sensor networks and distributed robotics. During 2002-2003 he was a post-doctoral fellow in Carnegie Mellon University with Prof. Howie Choset working on multi-robot coverage and single robot exploration. He obtained his Ph.D. and M.Sc. degrees from the School of Computer Science, McGill University, in 2002 and 1995 respectively. His Ph.D. work was on ‘‘Cooperative Localization and Multi-Robot Exploration’’, Supervisors: Gregory Dudek and Evangelos Milios. His M.Sc. thesis was titled ‘‘Optical Flow Estimation by Analysis of Motion Blur’’, Supervisors: Godfried T. Toussaint and David Jones. He has a B.Sc. degree from the Department of Informatics, Faculty of Sciences, University of Athens, Greece at 1991. His interests are in robotics, computer vision, artificial intelligence, sensor networks, multi-robot systems, and software architectures for complex systems.

41

Suggest Documents