The system checks that each new view of the object is expected given the robot's belief ..... should finish at the same corner on the next surface, at the required.
Technical Report 95/12
Model-Based Circumnavigating Autonomous Robots Nick Barnes Zhi-Qiang Liu
Department of Computer Science, The University of Melbourne, Parkville, Victoria 3052, Australia. May 1995
Abstract
In this report, we propose a system for vision guided autonomous circumnavigation, allowing robots to navigate around objects of arbitrary pose. The system performs model-based object/view recognition from an intensity image using a canonical viewer-centred model. A path planned from a geometric model guides the robot in circumnavigating the object. This system can be used in many applications where robots have to recognize and manipulate objects of unknown pose and placement. Such applications occur in a variety of contexts such as factory automation, underwater exploration, space exploration and nuclear power station maintenance. We also de ne a canonical view graph to model objects, which is a viewer-centred representation.
2
1 Introduction Many tasks for autonomous mobile robots in manufacturing applications, dangerous environments, and space or underwater exploration involve a robot manipulating or closely observing/inspecting an object. In order to interact with an object, an autonomous mobile robot must identify the object, be able to move with respect to the object, and in some cases dock with the object in a speci ed way. Pages et. al. [15] criticise most current manufacturing autonomous mobile robots for in exibility, as they have to follow white strips, or buried wires in order to move about the workplace. Hormann and Rembold [7] point out that the future of assembly automation is dependent on exibility and reliability. In this report we describe an autonomous robot system which is able to navigate around a given object. This circumnavigating system provides greater exibility for robots that need to interact with objects, because only a model of the object is required, not of the environment, or position relative to the environment. Thus, the system's model does not have to be modi ed if the environment changes. Our system performs 3D Object Recognition regardless of the object's pose and position. The system assumes the robot is close enough to the object for recognition. It identi es both the object, and the view of the object using a viewer-centred object representation. The inverse perspective 3
transform of the object is then calculated, allowing the robot to geometrically generate a safe path around the object from the object model. The robot follows the path to the required relative location using odometry. The system checks that each new view of the object is expected given the robot's belief about the object's position, and its own relative position and motion. This increases the robustness of identi cation, and prevents accumulation of the uncertainty inherent in the navigation and odometry. To solve complex autonomous tasks, Arkin and MacKenzie [1] presented a methodology for breaking tasks into a sequence of temporally integrated perceptual algorithms and transitions between these algorithms. The system presented here could be applied in such a methodology. The robot would initially move close to the object using other methods available in the literature, for instance [22]. When the object is recognised, the transition to our system occurs. Our system identi es the pose of the object, and guides the robot to a speci c location around the object, where a task, such as docking, is required. We assume that further speci c interactions with the object, such as docking, will be handled by other methods, for example [13], once our system has guided the robot to the required location on the object. Our viewer-centred model representation is similar to the aspect graph [9]. However, we do not enumerate every theoretically possible view, only canonical views. Loosely speaking, canonical views are general views with restricted overlap, dierentiated by large scale features. The canonical views 4
representation is a graph, where each node is a general view, and each edge is a link between adjoining views in the 2D path view of the robot. Thus the graph has no branches, and a traversal will visit, in order, the views encountered by the robot while moving around the object, i.e. the representation is indexed by order of appearance. Once the object has been recognised in our model-based approach, provided robot movements are approximately known, the robot always knows which view of the object to expect based on its location relative to the object. The robot navigation problem described here de nes a causal relation between each of the object images to be matched, and our canonical view graph represents this relation. The advantage of viewer-centred representation for vision is that only a 2D - 2D match is required for each view, signi cantly reducing the computational cost of evaluating a candidate view.
2 Related Work In this section we brie y describe some autonomous mobile robot systems that are related to our work. Rembold [18] describes a typical assembly task in which a robot derives necessary information about a part from the assembly drawing and bill of materials, it then navigates to a suitable docking position at an assembly workstation and assembles the required parts. The system plans a path 5
through a known geometric world model to a docking location. It moves using vision sensors and odometry, avoiding unexpected obstacles, to the approximate docking location. To perform the assembly the system either docks using touch sensors, or adjusts its operations to the docking error. This system requires a world model, in which workstation location and pose are speci ed. Arkin and MacKenzie [2] break the type of navigation task presented by Rembold into ballistic and controlled movement. Such a breakdown facilitates temporal sequencing of several algorithms for dierent tasks [1]. In ballistic motion the robot moves rapidly in the general direction of the desired object guided by a so-called phototropic perception algorithm, which attracts the robot to bright light. A transition occurs when the robot recognises the required object. It then moves in a controlled fashion, which is slower but more precise, to dock using simple visual feedback based on a single target region of known size and position relative to the dock. A second transition occurs when the robot moves too close for visual feedback. Ultrasound guides nal docking. The system does not have to model the whole environment, and performs model-based controlled motion. However, the system is unaware of the model of the whole object. As a result, it requires the docking surface of the object to face the robot. Localisation of an autonomous robot with respect to a map or birds-eye view image has been the topic of signi cant research. There have been many 6
methods applied to identify location for instance speci c topological features [20], and landmarks [19]. These systems assume that the environment is suciently complex to ensure that a restricted feature set is unique at any given point. The Hermies system [22] moves autonomously about a nuclear power station to perform maintenance at a control panel. This type of system uses a full world model, in which some obstacles, moving and stationary, do not have to be represented, but the pose and position of the required control panel, with respect to the environment must be known. In this report we proposean autonomous robotic system for a dierent problem from those described above. Most of these systems navigate based on a world model, and do not discuss interacting with objects whose location and pose with respect to the world model is unknown. Although these systems may avoid obstacles that are not included in the model, they are not able to interact with these objects. Arkin and MacKenzie [2] do not require a model of the whole environment. Their model-based controlled motion allows some
exibility for alignment with the object; however, only one view of the object is modelled. Further, they do not address the case where the workstation is rotated, occluding the required view.
7
3 Diculties in Object Navigation There are some diculties associated with a robot navigating around a speci ed object, for instance, background objects that are similar to the object concerned may lead to confusion for the robot resulting in moving to the wrong object, or wandering in an entirely wrong direction. An example of such a problem is illustrated in our second experiment where the robot must navigate around a particular model car while a very similar car sits right beside the desired one. Object recognition allows us to discriminate quite similar objects. If there are identical objects and the robot is told to navigate around a particular one, our system follows the required object by constantly updating its expectation about its position. However, if the odometry error is large, or the object recognition is poor such a system may still move to the wrong object. This situation can be handled by xation [17] or tracking object features while the robot moves. However, we will not discuss this further in this report. Deep and narrow concavities in the object can also create diculties for following the object's surface. Figure 1 shows an object with a concavity that is too narrow to access. A concavity that is just wider than the robot could cause a naive system to get stuck, whereas a knowledge-based system can reason about the size of the concavity relative to itself and determine how far into such a concavity it can safely move, or whether to go into such 8
a concavity at all.
Object
Robot
Figure 1: Expected navigation path for an object with a hole too narrow for robot access
4 System Architecture Our proposed system navigates around three-dimensional objects with the robot moving on the ground plane. We assume that the camera is located at a known distance above the plane. Camera elevation may be varied as long as the absolute height is known. As a result there are only three degrees of freedom (one rotational and two translational) for calculating the respective locations of the robot and object. Figure 2 shows the system architecture. There are four basic parts: preprocessing; view recognition; determination of the inverse perspective transform; and model-based geometric path-planning. Pre-processing takes a raw 9
image and returns the required features, for instance, edge segments, that can be used for matching for view recognition. View recognition matches the features to views in a canonical view model, identifying the view in the image and correspondence between that actual image and model. Image to model correspondences are used to calculate the inverse perspective transform. The inverse perspective transform gives the approximate pose and position of the object relative to the camera. From a relative x on object location the robot can reason about which path it should take around the object to attain its goal, then executes movement along a planned path. In the following sections we will discuss these components in detail. Camera
Input Image Pre-processing
Edge Extraction Edge Image Edge Segmentation
Object Model
List of Edge Parameters Object Recognition / Edge Matching List of Image/Model Edge Correspondences Pose And Distance Determination Camera/Model Coordinate Transform
Model-Based Geometric Path Planning
Next Robot Position
Figure 2: System Architecture
10
4.1 Pre-processing At this stage, edges are extracted from the image, and segmented into lines of near uniform curvature. The image may be smoothed with a nearest neighbour median lter to reduce noise. It is preferable to avoid this process as it may distort corners and edges. Edge extraction is performed by the Canny Edge detector, followed by edge synthesis [4]. We attempt to extrapolate a short initial line segment as far as possible by continually tting curves until the extrapolation error becomes to great, corresponding to a corner (sharp change in curvature). We t both a quadratic and a straight line, and choose the t which produces the longest segment. Extrapolation is performed using Neville's algorithm [16]. The output of pre-processing is a list of start and end points, and of curvatures for all edge segments in the image to be used for recognition.
4.2 Object Recognition The recognition system recognises both the object, and the view of the object in the image. We use a viewer-centred, model-based object recognition technique, similar to the aspect graph [9] representation.
4.2.1 Aspect Graphs Aspect graphs enumerate every possible characteristic view of an object. A characteristic view is a continuous region of views, for which all the visable
11
edges and vertices, together with their connectivity relationships are the same [21]. Thus, neighbouring characteristic views are distinguished by a change in the visual feature topology of the object. This is called a visual event. Characteristic views are general in the sense that a continuous range
of viewpoints around the object is represented by a single view. In an aspect graph, each characteristic view is represented by a node, and a visual event is represented by an edge between neighbouring nodes. See [5] for a more complete de nition. The aspect graph representation has several problems which prevent it from being practical in the real world [6]. Speci cally, aspect graphs have no adequate treatment of scale; and lack an adequate indexing mechanism for views. The scale problem leads to views being generated that can never be seen in practice, or are extremely unlikely to be viewed. Further, once a real camera is used, some features represented will be too small to appear, or be distinguished from noise. Such redundant representation can generate millions of views for complex objects, which creates problems in both computation and indexing. In this report we propose a canonical view representation which does not suer from these problems for the speci c domain of robot navigation.
12
4.2.2 Canonical Views We may eliminate many views stored within an aspect graph by the following observations: 1. Real cameras have nite size, violating the pinhole assumption. 2. Objects have features that contribute little to recognition. 3. A robot will not generally be expected to recognise an object at a very close range. 4. Camera images are discrete, thus for a particular viewing range, features below a certain size are not distinguishable. 5. Causality present in mobile robots can play an important role in recognition.
De nition 1 A model-scale view is the set of neighbouring characteristic views which are dierentiated only by features that contribute little to recognition.
Figure 3(a) shows a closeup of a wheel hub. The small features of this hub, including the wheel nuts generate many visual events. Figure 3(b) shows the edges extracted from a model view taken from the model car used in our experiments. The wheel-nuts are not represented in a model-scale view, as they are visable on both sides of the car, and occur many times on each side. 13
The circle of the wheel is a much more distinct feature for identifying the wheels, thus the circle may be included as a model-scale feature. The edges extracted from the wheel nuts are poor due to low contrast on the hubs, making them less useful again for recognition.
(a) (b) Figure 3: There are 5 wheel nuts on each of the four wheels. 2 wheels are visable from each side of the car. (a) Common features with little contribution to view descrimination such as these wheel nuts may generate many visual events. (b) Edges from one of the model views of the model car showing wheel nuts. A model-scaled view representation may enormously reduce the number of views for a particular object by not representing surface texture and features that are repeated on the object, or in the background. Figures 4(a) and 4(b) are images from two slightly dierent viewing angles of the model car. Several minor visual events occur between the images, for instance, the hub of the nearest front wheel becomes occluded in 4(b). To generate separate generic views within a model, based on such small visual events means all other surfaces visable will have to be represented twice. We de ne a subsumed view that incorporates such separate model-scale views 14
into a single view.
(a) (b) Figure 4: Visual events of relatively small features dierentiate these images (a) The hub of the front wheel is visable (b) The hub of the front wheel is occluded.
De nition 2 A subsumed view is the set of all neigbouring model-scale views which are dierentiated by visual events involving features of size s, where the largest surface in the view is of size l, and: s k; l
where k is a constant.
For instance, in Figures 4(a) and (b) the largest surface is the side of the car, so l is set accordingly. k can be set empirically to allow s to be up to the size of the car wheels, see Figure 5. Note this does not reduce the granularity of the representation. No features are removed by this de nition; however, the range a subsumed view may be larger. The number of model-scale views 15
included within a single subsumed view, and hence the number of subsumed views of a model is related to k.
Figure 5: Setting a value for k: l is the width of the side of the car, k is set empirically to allow s to be up to the size of the wheel. However, the subsumed-view representation includes views in which the surfaces modeled are a subset of the surfaces of other views. To alleviate this redundant representation we de ne a canonical-view representation.
De nition 3 A canonical-view is a set of all views of a subsumed-view graph that are not redundantly contained within any view of the set.
To give a simple example Figure 6 shows a simple cube. For the visable portion of the cube, three neighbouring faces, the standard aspect graph would generate a total of 7 views. One possible canonical-view representation is derived by recognising the fact that all the views are a subset of view 7, and so only view 7 is represented. A true mactch should given by a subsumed-view representation if any of the seven views appear. A second possible canonicalview representation would include views 1, 3, and 5. There are more separate 16
views, but the matching scheme is simpli ed. In such a scheme an image of view 7 would match any of the three canonical views. View 1 A
A
A
View 6 B
C
Veiw 2
C
View 3
A B
View 5
C
B
B
View 7
C
View 4
Figure 6: A partial aspect graph representation of a cube. A canonical view represents a range of actual views, and thus, as discussed in the next section, must allow a range for features, introducing scope for error. However, the robot moves around the object and matches several causally related views in the model. The combination of evidence from each
view match greatly increases certainty of identi cation. Shape from shading methods [8] can be used to verify the identi cation of surfaces if this is required to discriminate between objects in a particular domain.
17
4.2.3 Features Ideally, surface features used for viewer-centred object recognition would be invariant for all views where the surface is visible. However, Burns et. al. [3] prove that there are no general-case invariant visual features under perspective projection, as features values \blow out" for some views, see Figure 7(a). However, features are said to exhibit low view variation if the variation is small in extent over a large fraction of the views. In Figure 7(a), although the features of surface 1 have extreme values, recognition is still possible on the basis of surface 2 and 3. In the class of objects with adjoining surfaces at acute angles, see Figures 7(b) and (c), the features of all visible surfaces may be in extreme ranges. Such objects are dicult for viewer-centred object recognition.
(a) (b) (c) Figure 7: Examples of feature "blow out": (a) The features of surface 1 are at extreme values, surface 2 and 3 are normal, (b) a wedge shaped object, (c) view from the tip of the wedge, all visible surfaces are distorted. Only edge features are used to recognise objects in our canonical view model, features of surfaces are not considered. Features are either of a single edge, or edge pair and are similar to those discussed by Burns et. al. [3]: 1. relative length of edges (Continuous range) 18
2. orientation of edges (Continuous range) 3. coincidental end-points of edges ([True, False]) 4. relative location of edges ([Above, Below], [Left-of, Right-of]) Burns et. al. analyse the variance of relative orientation of edges. However, in our current system the object only rotates about the Z axis, see Figure 8(a), so edge orientation relative to the image y-axis, , is stable. Variance of due to perspective is proportional to the angle of the edge to the image plane , and the relative displacement of the line end-points from the camera focal centre d. Also, as increases, shortening the horizontal component of the edge, without dilating the vertical component, may decrease. z y
z
y
Image Plane η
y
α
Object Coordinate system x
∆d Image Plane
x
x Camera Coordinate system
(a)
(b)
Figure 8: Coordinate systems of camera, model and image. (a) The camera to object transform. (b) The image plane. The relative length of two edges varies, by de nition, when there is a greater change in the length of one edge at dierent view angles. Edge length 19
decreases proportionally to the distance from the camera, and proportionally to (for edges with a horizontal component). End-points of two edges in the image are labeled coincidental if the ratio of the distance between the nearest end-points d, and the length of the shorter of the two lines p is less than a constant c, for the range of the canonical view. d c l
The relative location of edges are a stable feature because the object only rotates about the z-axis. An edge pair is labeled only if every point on one edge is above/below/left-of/right-of every point of the other, for every viewing angle. If an edge pair have a left-of or right-of relation in one view it will be true in all views, except for cases where one edge is on a limb which is above the surface of the other edge, see Figure 9. Above/Below relations are aected by perspective. Parallel lines at the same depth in the image will not cross, however, non-parallel edges, at varying depths may. The perspective eects vary with displacement from the image centre in x, and with , see Figure 10. Orientation and relative length are continuously valued. Variance of can be estimated from perspective projection for the largest absolute values of and d. Variance of relative edge length can be estimated for perspective 20
1
3 1
2
2 (b)
(a)
Figure 9: In (a) edge 1 is right-of edge 2, in (b) edge 1 is left-of edge 3, but edge 1 is left-of edge 3 whenever both are visible. A
B C (a)
A
B
C (b)
Figure 10: Edge B and C are parallel and at the same depth, however, edge A is at a dierent depth: (a) A is above B, B is above C, (b) B is above C, but A is not above B. projection of maximum absolute object rotations and maximum and minimum distances that generate extremes of relative edge distance and relative for each pair of edges. This can be derived empirically by taking images of
the object at the extremes, or by using the perspective equations for rotation only about the z-axis, see Equations (6) and (7). Two edges are coincidental in the model if they have a common endpoint in the object, and the edge extraction and segmentation can show this reliably within the range described. Relative position labels can be assigned to parallel lines excepting the limb case described above. Other edge pairs need to be considered at the extremes of view ranges in the same manner as for orientation and relative 21
length.
4.2.4 Experimental Canonical-View Model This section gives a brief example of how a canonical view representation is constructed. The example is from our second set of experiments, navigating around the model car pictured in the centre of Figure 11. Each of the four views shown represent the surface the adjacent surface of the car. The canonical views generated overlap at the corners. Feature value ranges were derived from the segmented edge images of these views, with variance estimated using the perspective equations.
Figure 11: The four canonical views of the model car used in our second experiment.
4.2.5 Causal Image Matching The system initially must check the image against every view to nd the best match, as the object pose is entirely unknown. However, once the rst match 22
is made and a move performed, the next view is causally determined based on the previous position and the movement. Each subgoal point, where the system will visually check its position, is linked to the canonical view that the system expects to see. Thus the system generally needs to match only one view for each move. In this report, the result of the matching process is a correspondence between edge end-points in the image and the model.
4.3 Determining Object Pose and Distance Research such as that by Linnainmaa, et al. [10] determines object pose from a single image with six degrees of freedom (three in translation and three in rotation). Sophisticated techniques, such as the Hough transform [10] or Newton's method in many dimensions [11], are often necessary to determine pose. As we discussed in previous sections, the object only has three degrees of freedom (two translational and one rotational). Camera height can be adjusted, but the absolute height, Tz , is known. This restriction makes an algebraic solution possible, assuming a pinhole camera of known focal length f , and known correspondences between two object points (X1; Y1; Z1), (X2; Y2; Z2) and image points (u1, v1), (u2; v2). Figure 12 shows the transformation from model coordinates to camera coordinates. From the model coordinate system, the object is rotated 23
zm
Model Coordinate system θ
zc
Ty v
xm
ym
Tz
Tx yc
u
xc
Camera Coordinate system
Figure 12: The camera and model coordinate systems degrees clockwise about the Z axis. The camera coordinate system views the object with a 90 degree rotation clockwise about the Z axis, followed by a translation of Tz along the new Z axis, Ty along the new Y axis, and Tx along the new X axis. By inspection, the transformation from the model coordinate system to the camera coordinate system is: Xc = Tx + Ym cos ? Xm sin ;
(1)
Yc = Ty ? Ym sin ? Xm cos ;
(2)
Zc = Zm + Tz :
(3)
The perspective equations for image coordinates (u; v) from camera co24
ordinates (x; y; z) are: u=
fXc ; Yc
(4)
v=
fZc ; Yc
(5)
f (Tx + Ym cos ? Xm sin ) ; Ty ? Ym sin ? Xm cos
(6)
f (Zm + Tz ) : Ty ? Ym sin ? Xm cos
(7)
Thus we have: u=
v=
Where the origin of (Xm; Ym; Zm) is on the line de ned by the centre of the image U = 0, V = 0. Tz , the height of the camera above the ground plane Tz is known, as is the focal length f of the camera. From here all (X; Y; Z ) are assumed to be model coordinates, coordinate system subscripts are dropped. At point (u1; v1), divide u by v: u1 (Tx + Y1 cos ? X1 sin ) = ; v1 Z1 + Tz
giving: 25
u1 (Z + T ) = Tx + Y1 cos ? X1 sin ; v1 1 z
(8)
Dividing at (u2; v2) we get: u2 (Z + T ) = Tx + Y2 cos ? X2 sin ; v2 2 z
(9)
Rearranging (7) at v1 : f (Z + T ) = Ty ? Y1 sin ? X1 cos ; v1 1 z
(10)
f (Z + T ) = Ty ? Y2 sin ? X2 cos : v2 2 z
(11)
Also, at v2 :
Subtracting (8) from (9) yields: u2 (Z2 + Tz ) ? u1 (Z1 + Tz ) = cos (Y2 ? Y1) ? sin (X2 ? X1 ): v2 v1
Subtracting (10) from (11) yields: f (Z2 + Tz ) ? f (Z1 + Tz ) = ? sin (Y2 ? Y1) ? cos (X2 ? X1 ): v2 v1
Combining these yields: 26
cos = (Z2 + Tz )[ uv22 (Y2 ? Y1) ? vf2 (X2 ? X1)] ? (Z1 + Tz )[ uv11 (Y2 ? Y1) ? vf1 (X2 ? X1)] (Y2 ? Y1 )2 + (X2 ? X1 )2
sin = (Z1 + Tz )[ vf1 (Y2 ? Y1) + uv11 (X2 ? X1)] ? (Z2 + Tz )[ vf2 (Y2 ? Y1) + uv22 (X2 ? X1 )] (Y2 ? Y1)2 + (X2 ? X1)2
(12)
(13)
Re-arranging (10): Ty =
f (Z + T ) + Y2 sin + X2 cos v2 2 z
(14)
Tx =
u2 (Z + T ) ? Y2 cos + X2 sin v2 2 z
(15)
Re-arranging (11):
More accurate estimates for are obtained if (12) is used when the right hand side yields a value close to 1, and (13) is used when the right hand side yields a value close to 0. Note also that if v1 = v2 = 0 there is no solution, and in the case where v1 and v2 are approximately equal the solution will be poor. We select points maximumly displaced in the image in x and y to 27
minimize such errors.
4.4 Model-Based Path Planning To circumnavigate an object, a robot should move around the object in a single direction, following the surface normal of the object at a safe distance away. The distance is considered safe if it is greater than any single move the robot will make. The robot will generally move to the next occluding boundary of the closest visible surface, restricting the possibility of moving past any features it hasn't seen. This heuristic is suitable for objects that are approximately square and not very large in comparison with the size of the robot. Moves should be broken up for long thin objects, and objects large in comparison with the robots size. The distance the robot should remain from object is problematic. In our second experiment where docking is performed, the robot moves around the object at a xed distance, moving closer only when the docking location is in sight. Generally, the robot to object distance should be large enough for most of the object to t into the view frame to allow recognition, but small enough for object points to be well spread in the image, allowing accurate pose and distance determination. Navigation is performed as a series of view-based local navigation problems. The robot determines the closest point on the object's surface, and the closest occluding corner point, in the direction of robot motion. If the corner point is less than a threshold distance from the closest point to the robot, 28
and the robot is approximately at the required distance from the object, the robot is said to be at the corner. The robot's behaviour is guided by the surface normal of the object, and whether it is at a corner, as follows: 1. If the robot is not at a corner, it should move to be at the required distance from the object along the surface normal at the nearest corner in the direction of robot motion. See the movement from the initial position to Point 1 in Figure 13. 2. If the robot is at a corner, at the leading edge of a surface in the direction of robot motion, it should move along at the required distance from the surface along the surface normal to the next occluding boundary of the surface. For example, the robot movement from Point 2 to Point 3. 3. If the robot is at the corner, at the end of a surface in the direction of robot motion, it should move around the corner, subtending an arc with the radius of the required distance, and centred at the corner. It should nish at the same corner on the next surface, at the required distance along the surface normal to that point. For example, the robot movement from Point 5 to Point 6. Note, however, if a concavity in the object is encountered, the robot remains at least the required distance from any surfaces at all times. In Figure 13, in moving from Point 3 to 4 the robot does not move around to 29
6
7
8
5
9
4
10
3
11
2 12 1, 13
Initial Position
Figure 13: Circumnavigation path derived for the terminal in the rst experiment. be perpendicular to the small surface behind the screen as that would bring it too close to the adjoining edge, similarly for Point 9. For each move, a trajectory can be calculated to follow the determined path, and the robot should follow this using odometry. The path around the object can be determined a priori. The system presented in this report assumes no obstacles block the robot's path. If obstacles were present, the robot could still generate the ideal path a priori. When obstacles are encountered the robot must decide whether it can move between the obstacle and the object, if it should back-track around the object, or attempt to track around the obstacle. To determine the path, we expand the boundaries of the object in two dimensions, similar to the con guration space approach [12]. 30
The object surface is represented by planar faces and piecewise quadric surfaces. Planar surfaces are represented as a list of connected boundary points. The points making up the widest projected edge or edges in the plane of robot motion are expanded by taking the normal to the surface at each point c. This is done by by taking the cross-product of the vectors from two other points in the plane a; b to c: b
c d a
vac x vcb
Required Point
Figure 14: The vector cross product of three points determines the normal.
~vac ~vcb ;
and normalizing the x, y components:
q
q
((^x = x= x2 + y2); (^y = y= x2 + y2)) For the point c; (xc; yc), the projected point at the required distance d is: 31
((xc + x^ d); (yc + a^ d)); where d is the required distance from the object. For a planar surface, the points are joined to form the extended boundary. Elliptical surfaces are represented in the model as a centre point (xc; yc; zc), and a radius in (x; y; z); (a; b; c). Only convex non-rotated ellipsoids are handled presently. (x + xc)2 + (y + yc)2 + (z + zc)2 = 1 a2 b2 c2 The expanded ellipsoid is formed by expanding each of the radiuses by d, see Figure 15: (x + xc)2 + (y + yc)2 + (z + zc)2 = 1 (a + d)2 (b + d)2 (c + d)2 The widest extent of the ellipsoid in the plane is at the intersection of the ellipsoid, and the plane in x, y for the minimum absolute value of Z. We approximate this with the smallest value for Z on this plane zmin (x + xc)2 + (y + yc)2 + (zmin + zc)2 = 1 (a + d)2 (b + d)2 (c + d)2 32
Note that this is only an approximation, as the path will not be exactly at distance d from the ellipse at all times. y
y
yc
c
b
d
x
zc
a
y
b
d
c
z
xc
(a)
x
(b) z
Figure 15: The expanded ellipsoid. Arcs around corners between adjacent surfaces are formed by taking the projected points for the corner on the two surfaces and joining them with an arc with a radius d, see the corners in Figure 13. Finally, any lines which are subsumed by other lines or set of lines (closer to the object) are discarded. In Figure 13, note that the two short, colinear boundaries, behind the curved front surface (screen) have been subsumed between points 5 and 6. Any lines which intersect are cut at the intersection, see where the arc 3-4 joins edge 4-5. Points on the object model are marked as occluding boundaries. The expanded point for any occluding boundary that is not subsumed is marked as a destination for local navigation. For complete circumnavigation tasks the system guides the robot around the object until it returns to the rst surface encountered. In our second experiment, the required docking position is marked on the object model. 33
From the robot's initial position it can calculate the closest corner points on object boundaries in both directions. The robot can then count the total number of stop points required to travel to the destination in either direction, and choose the direction which requires the minimum number of visit points.
5 Experimental Results We have conducted experiments: using a simulated image of a computer terminal, see Figure 16; and using real images of a model car against the background of the Computer Vision and Machine Intelligence Lab (CVMIL) at the University of Melbourne, see Figure 19. The simulated images were generated using the POV ray tracing package, whereas the model car images were taken by a Minitron Javelin camera.
5.1 Simulated Circumnavigation The task is to fully circumnavigate the computer terminal, returning to the rst surface encountered.
(a) (b) (c) (d) Figure 16: Sample views of simulated object (a) Initial view (b) View after the rst move (c) View after the second move (d) View after the third move 34
Figure 16 shows experimental images of the terminal including the initial view and the rst 3 moves around the simulated object. (6)
(5)
(4)
(7)
(3)
(8)
(2)
(9) LEGEND Expected point Actual point (10)
(1)
Figure 17: Expected and actual positions Table 5.1 lists the parameters estimated by the system for each image taken as it circumnavigated the object. Figure 17 shows a birds-eye view of the points where the system was expected to move to, verses the actual points estimated by the system. The numbers on Figure 17 correspond to the destination points for the view numbers of table 5.1. The angle is the error in the system's estimate of the relative angle between the image plane, and the object's surface. The distance is the error in the estimate of the distance to the object as a percentage of the required distance to the nearest object surface. The displacement is the error in the estimate of perpendicular displacement of the object as a percentage of the required distance to the nearest object surface. Finally, the destination location is the error in the estimate of the next location the robot should move to as a percentage of the required distance to the surface. 35
Note in Figure 17 that there are no actual points for the expected points between (3) and (4), and (7) and (8). If the system calculates that the closest point on the object is within a threshold distance of a corner, the robot is considered to be at that corner and so progresses to the next corner. The object corners corresponding to the rst skipped point and point (3) are close enough together that both points fall within the minimum threshold of point (3). Thus the robot moves directly to point (4). Similarly for the other point that is skipped.
View Angle Distance Displacement Destination location 1 2 3 4 5 6 7 8 9 10
(degrees) 2 3 2 2 1 2 0 0 1 1
(%) 1 1 1 0 0 0 1 0 1 1
(%) 8 2 0 0 0 1 1 0 1 1
(%) 7 6 6 8 2 7 1 1 1 12
Table 1: Errors for parameter determination by view for the simulated images. This simulation demonstrates circumnavigation of a simulated terminal. The system is able to recognise the terminal and its relative pose and position, and to generate a safe path around it, always maintaining the knowledge of its approximate position relative to the terminal. 36
5.2 Docking With a Model Car Our second series of experiments uses a model car to demonstrate an application of this system for an industrial automation setting. The robot is required to identify the car and navigate around it to the driver's side door (passenger side in US cars), where it is to move in close to the back edge of the door for nal docking. To do this, it should determine the shortest path (clockwise or anti-clockwise) around the object, and must pick the required car out of the two cars in the rst image. The canonical view model used for this experiment is shown in Figure 11.
(a) (b) (c) (d) Figure 18: Camera images of car from experimental run: (a) Initial view (b) First view chosen (c) Second view chosen (d) Final docking position. Figure 18 shows the views from camera as it was moved around, based on the system's navigation directives. From (a) the system determines that the nearest boundary of the model is the front corner to the left of the image. The number of points to be visited in either direction around the object is three moving clockwise, or seven moving anti-clockwise, thus clockwise is chosen. Figure 20 shows the experimental set up at the nal docking position of shown in 18(d). Table 5.2 shows the errors for this navigation trial around the car, and Figure 19 shows the corresponding expected navigation path 37
around the car verses the actual positions. The columns of the table are the same as for the previous experiment. Note the destination error for view 3 is a percentage of the docking distance. (Initial Position) (1)
(2)
LEGEND (3) Expected Position (Docking Position) Actual Position
Figure 19: Expected and actual positions for the second experiment As can be seen from Figure 19, the system was able to generate a safe path around the car, to arrive close to the required docking position.
View Angle Distance Displacement Destination location 1 2 3
(degrees) 10 5 5
(%) 5 7 7
(%) 1 0 7
(%) 8 52 21
Table 2: Errors for parameter determination by view for the model car.
38
Figure 20: Experimental setup with camera at the docking position
6 Conclusion We have implemented a system to guide an autonomous robot in circumnavigating objects of arbitrary pose. By applying three dimensional object recognition and pose determination the robot was able to plan a path around the object, based on a geometric object model. This allowed the system to navigate safely around a known object. The technique handles errors in positional estimation and errors that would occur in odometry by frequently visually checking its position relative to the object, and by restricting the distance moved based on any single estimate, relative to its distance from the object. We also de ned a canonical-view model for object recognition suited to mobile robots. We are currently implementing canonical views using Minsky's frames [14]. Each view represents a frame, with slots being subframes that match surfaces, and rules de ning which combination of surfaces must 39
be present for the canonical view to be matched. Each subframe matches a single surface, and the slots are lled by features of the object. The slots initially hold the allowable range of values for feature variance. Subsequent experiments include background objects blocking the robot's path. In this case the robot will have to move around the obstacle or backtrack around the object. Further experiments will determine partial shape-from-shading to discriminate between objects with similar wire-frames, and allow for partial modeling, so that the path can be planned on the basis of the estimated surface shape. Also to be investigated are xation techniques [17] to allow the system to navigate around a speci ed object when there are similar or identical objects in the background.
References [1] R. C. Arkin and D. MacKenzie, Temporal coordination of perceptual algorithms for mobile robot navigation, IEEE Trans on Robotics and Automation, 10(3) (1994).
[2] R. C. Arkin and R. R. Murphy, Autonomous navigation in a manufacturing environment, IEEE Trans. on Robotics and Automation, 6(4) (1990).
40
[3] J. B. Burns, R. S. Weiss, and E. M. Riseman, View variation of pointset and line-segment features, IEEE Trans. on Pattern Analysis and Machine Intelligence, 15(1) (1993).
[4] J. F. Canny. Finding edges and lines in images. Master's thesis, MIT Arti cial Intelligence Laboratory, Massachusetts Institute of Technology, Massachusetts (1983). [5] D. W. Eggert, K. W. Bowyer, C. R. Dyer, H. I. Christensen, and D. B. Goldof, The scale space aspect graph, IEEE Trans. on Pattern Analysis and Machine Intelligence, (1993).
[6] O. Faugeras, J. Mundy, N. Ahuja, C. Dyer, A. Pentland, R. Jain, and K. Ikeuchi, Why aspect graphs are not (yet) practical for computer vision, in Workshop on Directions in Automated CAD-Based Vision, 97{104 (1991). [7] A. Hormann and U. Rembold, Development of an advanced robot for autonomous assembly, in IEEE Int. Conf. on Robotics and Automation (1991). [8] B. K. P. Horn and M. J. Brooks. Shape from Shading. The MIT Press:Cambridge, Massachusetts:London, England (1989).
41
[9] J. J. Koenderink and A. J. van Doorn, The internal representation of solid shape with respect to vision, Biological Cybernetics, 32, 211{216 (1979). [10] S. Linnainmaa, D. Harwood, and L. S. Davis, Pose determination of a three-dimensional object using triange pairs, IEEE Trans. on Pattern Analysis and Machine Intelligence, (1988).
[11] D. G. Lowe, Three-dimensional object recognition from single twodimensional images, Arti cial Intelligence, (1987). [12] T. Lozano-Perez, Spatial planning: A con guration space approach, IEEE Trans. on Computers, C-32(2) (1983).
[13] K. Mandel and N. A. Due, On-line compensation of mobile robot docking errors, IEEE Int. Journal of Robotics and Automation, RA-
3(6) (1987). [14] M. Minsky, A framework for representing knowledge, in P. H. Winston, editor, The psychology of computer vision, McGraw-Hill, New York (1975). [15] J. Pages, J. Aranda, and A. Casals, A 3d vision system to model industrial environments for agv's control, in Proceedings of the 24th International Symposium on Industrial Robots (1993).
42
[16] W. H. Press, B. P. Flannery, S. A. Teukolsky, and W. T. Vetterling. Numerical Recipes in C The Art of Scienti c Computing. Cambridge
(1990). [17] D. Raviv and M. Herman, A uni ed approach to camera xation and vision based road following, IEEE Trans. on Systems, Man and Cybernetics, 24(8) (1994).
[18] U. Rembold, The karlsruhe autonomous mobile assembly robot, in S. S. Iyengar and A. Elfes, editors, Autonomous mobile robots, IEEE Computer Society Press: California, 2, 375{380 (1991). [19] S. Tachi and K. Komoriya, Guide dog robot, in S. S. Iyengar and A. Elfes, editors, Autonomous Mobile Robots Control, Planning, and Architecture, IEEE Computer Society Press: California, 2, 360{367
(1991). [20] R. Talluri and J. K. Aggarwal, Position estimation for an autonomous mobile robot in an outdoor environment, IEEE Trans. on Robotics and Automation, 8(5) (1992).
[21] N. A. Watts, Calculating the principle views of a polyhedron, in 9th International Conference on Pattern Recognition, 316{322 (1988).
43
[22] C. R. Weisbin, G. de Saussure, J. R. Einstein, F. G. Pin, and E. Heer, Autonomous mobile robot navigation and learning, IEEE Computer,
22(6) (1989).
44