Autonomous Robot Navigation with Automatic Learning of ... - CiteSeerX

12 downloads 57742 Views 505KB Size Report
an automatic acquisition of visual-landmarks, second phase is for estimating robot position during ... Email: [email protected], [email protected] .... The template can be computed either by synthesizing a virtual-view, as for instance using a.
Autonomous Robot Navigation with Automatic Learning of Visual Landmarks Salvatore Livatino and Claus B. Madsen? Laboratory of Image Analysis, Aalborg University Fr. Bajers Vej 7D DK-9220 Aalborg East Denmark

Abstract. By observing visual landmarks it is possible to continuously update an estimated robot position while the

robot is moving. In particular, using a triangulation algorithm based on three landmarks the robot position can be estimated each time three landmarks are observed. If the selected landmark triplet is an optimal triplet, the estimated robot position is accurate and errors do not accumulate. In order to make the system autonomous, both acquisition and observation of landmarks have to be carried out automatically. The paper consequently proposes a method for learning and navigation of a working environment. In particular, a two-phase procedure is proposed: rst phase is for an automatic acquisition of visual-landmarks, second phase is for estimating robot position during navigation. Since visual landmarks acquired during the rst phase have to be recognized during the second phase, a method is proposed for accurate landmark acquisition. In this way, the recognition will be tolerant to positional errors and changes in viewpoint. Feasibility and applicability of the proposed method is based on a system with a simple setup. Novelty and potentiality, are in combining algorithms for optimal landmark-triplet selection, triangulation and robust landmark acquisition.

1 Introduction Mobile robotics is a very important eld of research because of the many potential applications. In addition to perform its task, a mobile robot needs navigation skills. For example, a robot for cleaning oors needs to be able to clean as well as to navigate the environment. Navigation algorithms for mobile robot system consist of three major elements: 1) path planning, 2) obstacle avoidance, and 3) self-localization. The role of the latter is to continuously compute the position of the robot relative to an environment model, (a precise description of the robot workspace). This paper addresses self-localization with automatic learning of an environment model.

1.1 Self-Localization Most of the self-localization algorithms proposed in the literature require use of an environment model. This information is compared to the sensorial input, so that any discrepancy can be used to correct errors inherent to the dead-reckoning system, (errors in wheel movement feedback due to sleepage etc.). There are numerous approaches to self-localization, and there are di erent sensor modalities as well, (vision, laser range- nders, sonars, etc.). In this context we shall only consider vision and visual landmarks. A visual landmark is a structure in the environment which can be recognized in images. [1] uses vertical edges, [2], [5] [7], [13], use instead texture patches. This paper proposes the latter since texture is considered a rich and discriminating information. Figure 1 left-hand shows examples of visual landmarks. The role of visual landmarks is to constrain the robot to the surrounding space. For this purpose landmarks can be used in several ways. One approach, presented by [4] involves a Kalman lter framework for updating robot position using measured distance or heading to a single landmark. It must be noted though, that robot position cannot be computed from either distance or heading to a single landmark, but only updated. Another use of landmarks involves triangulation, [3], [5]. In particular, by knowing camera parameters and ?

Email: [email protected], [email protected]

landmark world-position, the robot position and heading can be estimated once three landmarks are detected in the camera image-plane. This paper also proposes the use of triangulation because of its simplicity and the fact that it does not involve any 3D reconstruction. Moreover, triangulation exploits information from three landmarks and allows to re-compute the robot position each time three landmarks are observed. In this way the system is tolerant to possible errors arising from previous pose estimations, i.e., errors do not accumulate. In practice detection of landmarks in images is uncertain and this leads to an error in the computed robot position. In previous work, [5], we have shown that for some landmark triplets this error can be so severe that navigation is impossible. On the other hand, this error can also be small enough to allow a very accurate pose estimation. An interesting issue is that uncertainty can be predicted for each landmark triplet, as showed in [3]. So that we can always choose a triplet which minimizes positional uncertainty before computing robot position. If we do it, and automatically while the robot is moving, we will carry out an accurate and reliable localization. This paper consequently proposes the use of this method for robot self-localization. The present paper extend our previous work on triangulation-based navigation, but looks speci cally at the issue of how to automate the learning of visual landmarks.

Fig. 1.: Left-hand gure shows examples of visual landmarks. Right-hand gure represents the main aspects a ecting the landmark appearance in the image plane. In particular, the gure shows the aspects on which the appearance of a landmark depends on, and the aspects on which the appearance in the image-plane is sensitive to. It can be seen that e ects of some changes may become source of error. For example changing in light-conditions may generate re exions, which can lead to errors in robot estimate position during selflocalization.

1.2 Learning The Environment In order to extend capabilities of the proposed self-localization system, it would be very bene cial to be able to learn the environment, that is, automatically building an environment model. This possibility makes the system more autonomous avoiding the burden of either building models by hand or using special structured environments. Typically, for autonomous system proposed in the literature, there is a learning phase previous to self-localization, [1], [2], [6], [7] . Following this guideline, a learning-phase is proposed for our system. This means in our case, an automatic acquisition of visual landmark images and their positions in the environment. The landmarks acquired in the learning-phase are used during the self-localization for computing robot position. In particular, some of the acquired landmarks have to be observed and recognized in the incoming images. Recognition of previously acquired visual landmarks is a complex and dicult task. Visual landmarks are characteristics of the environment which are observed by a camera and this leads to the fact that they may appear really di erent during learning and self-localization. In fact, the appearance of a landmark into an image-plane depends on many aspects, mainly: landmark geometry and texture, environment light-condition, distance and angle between camera and landmark. The landmark shape and its texture-structure are stable characteristics, that is, they will not change once we have chosen our landmark, meanwhile, environment light-condition, distance and angle to a landmark may be di erent between learning and self-localization time, and the system performance will be very sensitive to such changes.

In particular, the appearance of a landmark in the image-plane is very sensitive to changes with environment illumination-conditions since this could lead to undesired re exions, highlights and shades. The appearance of a landmark also is very sensitive to changes with distance and angle to a landmark, since different aspects of objects may become visible, and occlusions and perspective distortions may be generated. Further di erences may arise from errors in landmark and camera position, leading to the fact that the appearance of a landmark into the image-plane is not as expected. In particular, errors in landmark position can be generated either from a noisy position computation or from changes in the landmark texture, (the latter could be the e ect of object movements). Errors in camera position mainly arise when light-condition has changed, di erent aspect of an object are visible in the image-plane, occlusions and perspective distortions are generated. Figure 1 right-hand summarizes main aspects a ecting the landmark appearance in the image plane. A main task for the learning-phase consequently is to select suitable characteristics of the environment to be used as landmarks. The selected landmarks have to be robust to changes in environment conditions and camera position, and to errors a ecting camera position. How visual selection can be performed and how this may help selection of suitable landmarks for our system, gives rise to a number of questions. For example, what is useful information? what is an appropriate selection criteria? Some of the above mentioned problems have been addressed with this paper. It has been necessary to introduce some constraints in order to reduce the complexity of the problem in this rst step of experimentations: the environment is supposed to be mostly static, that is, only a few changes may occur, and illumination can vary moderately. Hence, a method has been proposed for the learning-phase in order to perform: a) a fast workspace view acquisition; b) a robust visual landmark selection. a) A workspace-view acquisition is performed by computing few panoramic views from known camera positions. In particular, by rotating a camera around its optical center is possible to collect workspace views and synthesize them into one panoramic view by using cylindrical projections, [8] . A panoramic view represents a compact and complete way to represent the whole workspace, and it is based on a system with a simple setup. At least two panoramic views will be computed in order to allow reconstruction of visual landmark position by stereo matching. b) A robust visual-landmark selection is achieved by the following two-step procedure. The rst step is for extracting image regions which are viewpoint-invariant and discriminating. For this purpose, we propose the Culhane-Tsotos attention selection mechanism, [9], in the version presented by [10]. The attention mechanism uses edges and corners as features since they are robust to viewpoint changes. It has also been considered the potential of including local symmetries as an additional feature and the experiments have con rmed the advantage of including symmetries. The extracted image-regions are called candidate landmarks. The second step is for selecting a sub-set of candidate-landmarks, which are observed with a low distortion. The analysis at this second step is only possible after that positional information has been computed, i.e. the 3D positional info motion of the landmark. In particular, knowledge of camera and candidate landmarks con gurations is required. That is, their position and orientation relative to the workspace coordinate system. The camera positions, from where the panoramas are acquired, are computed by observing a specially designed reference area. The landmark 3D-position is instead computed by stereo reconstruction. In particular, correspondent views of candidate landmarks inside di erent panoramas are compared by using template matching, (normalized cross-correlation), so that disparity information can be found and 3D position of the landmarks can be computed by a stereo reconstruction. In order to compute landmark orientations in a simple way, only visual landmarks representing wall regions will be selected. These situations can be recognized since landmark 3D positions has been computed, and a wall map is provided. Eventually, the perspective distortion a ecting appearance of landmarks in the image-plane can be tested, since the relative position between camera and landmark can be computed. Only candidate landmarks which have been observed with a distortion under an experimentally determined threshold will be elected to be visual landmarks for the self-localization phase. As a consequence of the above summarized problem analysis, an algorithm is proposed for learning the environment. The proposed algorithm will contain the followings four steps: 1) Panoramic-Views Synthesis, (i.e. to synthesize two or more panoramic views); 2) Landmark Candidate Extraction (i.e. to run the attention selection mechanism on the panoramas); 3) Model Building, (i.e. to compute visual-landmark positions and orientations); 4) Model Re nement, (i.e. to elect candidate landmarks which are observed with a low distortion). The presented two-stage procedure for learning the environment and self-localization, will provide the mobile robot system with a good level of autonomy and an accurate and a reliable pose estimate. Next sections present the proposed method, experimentations, and conclusions.

2 The Proposed Method The physical context for the problem addressed in this paper is a mobile robot moving on a planar surface, equipped with a single camera that can pan freely 360 relative to the robot heading. During the learning phase the robot is taken to few locations and the camera position is measured for each location by observing a specially-designed reference area. A dense collection of workspace views is taken from each location, and then, the proposed algorithm for learning the environment is executed. The algorithm computes an environment model which consists of a set of visual landmarks images and their position inside the workspace. The system will consequently be ready for self-localization. During self-localization the robot will be free to move inside the environment following any desired path, while the proposed algorithm for self-localization will estimate robot position, being also able to correct errors inherent to the dead-reckoning system. Figure 2 left-hand shows the proposed two-phases localization procedure.

Fig. 2.: Left-hand gure shows the proposed two-phase localization procedure. Right-hand shows a schematic representation of the landmark detection process.

2.1 Self-Localization Phase The self-localization procedure has three main steps of computation: 1)optimal-triplet selection; 2)landmark detection; 3) pose computation.

Optimal Triplet Selection This step of computation is for automatically computes an optimal landmark triplet. In particular, given a reasonable estimate of robot position, landmark positions, and the camera focal length, the system is able to evaluate the possible landmark triplets before computing robot position, using the procedure described in [5]. The triplet which minimizes the positional uncertainty is thus chosen.

Landmark Detection This step is for localizing the optimal triplet on the observed view. In particular,

given a reasonable estimate of both robot and landmark positions, the system is able to compute rotation of the camera in order to catch the optimal triplet. Having both camera focal-length and aspect-ratio, visual landmarks can be re-mapped to how they should appear in the image-plane. The re-mapped landmark images are then used as templates in a normalized cross-correlation procedure in order to detect the landmarks in the incoming image. Figure 2 right-hand shows a schematic representation of the landmark detection process. The detected locations of landmarks in the image-plane represents the input to the triangulation method. It has to be noticed that landmark detection is accurate using template-matching, but it is needed to compute a template. The template can be computed either by synthesizing a virtual-view, as for instance using a plenoptic-based rendering as in [8], or by perspectively re-mapping representative images of landmarks.

Pose Computation This last step is for computing the robot position by using triangulation. In particular, having camera focal-length and landmarks positions in both camera image-plane and robot workspace, the triangulation algorithm computes the camera position and heading, which in turn is converted to robot position and heading.

2.2 Learning Phase The method proposed for the learning-phase has four main steps of computation: 1)panoramic-view synthesis; 2)landmark candidate extraction; 3)model building; 4)model re nement.

1) Panoramic-View Synthesis The rst step of computation is for acquisition of workspace images. It

is very important to investigate which is a convenient location for image acquisition. The quality of the landmark images will be strongly depending on this choice. In particular, landmark-image de nition and distortion are important factors and they depends on the relative position between camera and landmark. What is an optimal location for image acquisition? How many images should be taken from each location? The ideal situation would be to capture occlusion-free frontal-views of landmarks, taken to a convenient distance. However, to reproduce this ideal situation is not practical and dicult to make automatic. This is why it has been decided for a more feasible and faster solution, which also ts with the available hardware. Images of the workspace are acquired by panning the camera 360 relative to the robot heading. A panoramic view is then synthesized from a dense sampling of images. It is proposed to use cylindrical projections since this procedure ts very nicely the situation of having a camera on a pan unit, and it allows a correct overlapping of images. Figure 3 shows a computed panoramic view of the robot workspace.

Fig. 3.: A computed panoramic view of the robot workspace. The represented cylindrical image has been synthesized by projecting 72 images of 350x350 pixels each. The overlapping area between two adjacent images is about 85%. Maximum overlap is 7 images. The position of the camera optical-center is estimated by using the especially-designed procedure above mentioned. This procedure is based on recognition and triangulation of three colored papers hanging on a wall at known positions. How many panoramic-views should be synthesized? and from which position? This is a topic which we are going to investigate thoroughly in the future. For the present work, the position for the panoramic acquisition has been chosen in order to meet landmark-position reconstruction requirements. A minimum of two panoramas is needed and most of workspace has to be visible from the chosen positions. The main parameters involved with camera location are described within the sub-section 3 Model-Building. As a consequence of having few camera locations for the acquisition, and to meet stereo reconstruction requirements, the distance and the orientation to many potential landmarks could not be advantageous for a robust image acquisition. Nevertheless, there should always be landmarks which are well located, and, as described later, the proposed algorithm is be able to recognize this situation.

2) Candidate Landmark Extraction The second step is for extracting signi cant panorama sub-regions to be candidate as visual landmarks. In this way, the visual information to be maintained for the next step will only consist of panorama sub-regions, (the candidate landmarks). The selected sub-regions are detected by using the proposed attention selection mechanism. In particular, it has been chosen to use the method invented by [9] in the version presented by [10]. The proposed attention mechanism method builds a hierarchy of maps using weighted sums of features extracted at lower levels. The features used in this framework are: intensities, edges, corners and local symmetries. First, feature maps are computed in parallel and then they are processed with a Di erence of Gaussian lter in order to generate conspicuity-maps. Using a weighted average the individual conspicuity maps are fused into a single saliency map. The saliency map is the input to the hierarchical selection process which produces as output a set of image regions which represent the candidate visual landmarks. Taking in account the proposed self-localization procedure and experimental results, it has been chosen to weight the vertical edges and local symmetry features more than intensity are. Figure 4 describes the

proposed procedure for extraction of invariant texture structures, and three typically selected candidate landmarks. Figure 7 shows extracted visual landmarks by running the attention selection mechanism on four selected feature, one at a time. Figure 8 left-hand shows typical visual landmarks extracted.

3) Model Building The third step is for computing position of candidate visual landmarks. By considering

two panoramic-views, and by having camera position for each view, it is possible to compute landmark positions by a stereo matching procedure. In particular, a candidate-landmark disparity map can be computed by looking for correspondences along epipolar lines. Cylindrical epipolar lines can be computed as in [8]. Figure 5 left-hand shows a schematic representation of the process for computing visual landmark position. What is an optimal stereo reconstruction con guration? How to reduce the noise while estimating landmark position? In order to answer these questions the main parameters involving stereo reconstruction have to be considered and eventually a compromise con guration has to be selected. Figure 5 right-hand shows the main parameters involved with stereo reconstruction. A sample camera con guration and workspace has been considered.

Fig. 4.: The proposed procedure for candidate landmark extraction.

Fig. 5.: Left-hand gure represent a schematic representation of the process for computing landmark position by having two cylindrical images of the environment. Right-hand gure represents the main parameters involved with stereo reconstruction. A sample camera con guration and workspace has been considered. The uncertainty on the computed landmark pose is depending on the angle a. The angle a width depends on baseline length and on the distances d1 and d2. The workspace areas labeled R and S represent the areas with less uncertainty. The uncertainty on the computed landmark pose is depending on the angle a. The angle a width depends on baseline length and on the distances d1 and d2. Landmarks that are not too close to the line connecting the camera positions (the baseline), can be reconstructed with less uncertainty. In case of workspace of gure 5, the best areas consequently are those labeled R and S. Which is the best distance to a landmark depends on: camera parameters, visual landmarks dimensions, and baseline length. The latter plays very important role. The baseline length is a compromise between, noise reduction, (when computing distances to landmarks), region matching reliability, (when computing disparities between views), and perspective distortion, (when observing environment characteristics). Experiments showed that the baseline length a ects di erently the match of correspondent landmarks, depending on the

landmarks texture. For example, a scalable and symmetric landmark, can be successfully matched also with a very large baseline. Concerning the perspective distortion when observing environment characteristics, the matter is referred to the next sub-section. In summary, in order to establish how many panoramic-views are needed and from which location they should be taken, we need to consider: camera parameters, workspace layout (a wall map for the presented work), visual landmark dimensions inside the image-plane, and observation areas, (see next sub-section). We are going to make a simulator to allow for planning of good acquisition positions.

4) Model Re nement The last step of computation is for re ne the model built at the previous step.

In particular, the model re nement step is for electing a subset of candidate landmarks according to their computed position. In fact, the distance and orientation to landmark has a great in uence on how the landmark appears in the camera image-plane. The acquired landmark image will be used as reference image during self-localization, so that a well de ned and low-distorted image will play a much better service than a stretched and distorted one. In particular, a good reference image will decrease the probability of either mismatches or missed recognition, and the e ect of this will be less noise in the estimated robot position. It has consequently been considered very bene cial to discard candidate landmarks which have either a perspective distortion above an experimentally determined threshold or re-mapped dimensions below an establish lower-bound. For the latter the risk is that the re-mapped texture would not provide enough discriminating information. Analogous is the case when re-mapped dimension exceed an establish upperbound. In fact, intensity values would became averaged since the system is trying to extrapolate information from the available pattern. Consequently, this would lead to di erences between observed and predicted intensity values. In order to determine the threshold, typical candidate landmarks have been considered, and for each of them a workspace area where landmarks can be recognized without mismatches was computed. The result is the observation-area. Figure 6 left-hand shows the computed observation-areas for three typical landmarks. In conclusion, by knowing position of both camera and candidate-landmarks, the system is able to test the observation conditions, hence, to discard candidate-landmarks which camera position falls outside their observation-areas, and to elect candidate-landmark which camera position falls inside. The elected landmarks will be used by the self-localization process.

Fig. 6.: Left-hand gure shows the estimated observation-areas for three typical landmarks. Right-hand toprow shows the mobile robot system used for experimentations. Right-hand bottom-row shows the two chosen locations for the acquisition of panoramas, which are labeled L and R. These positions would allow to nd landmarks on walls Wa and Wb.

3 Experimentation The aim of the experimentation phase was to determine the feasibility and performance of the proposed method in realistic situations. The algorithms were consequently implemented and tested on a real mobile robot system. The hardware for our experiments was a PC mounted on mobile platform (Robuter by Robosoft) equipped with a color-camera on a pan-tilt unit. Figure 6 right-hand top-row shows our system.

The robot workspace for our experiments was our laboratory, that is, a 8m x 5m room with plenty of obstacles. A wall map, representing wall of the laboratory room, was provided and stored in a database. The experimentation is described for each step of the proposed method.

Panoramic Acquisition and Synthesis The robot was taken to two locations chosen following the

considerations made in last section. In particular, by taking in account the workspace wall-map and diagrams of gure 6, most suitable areas for visual landmarks were identi ed. They were walls Wa and Wb. These walls represented a good area for visual landmarks since they can be observed by both rooms inside the workspace. Consequently, positions labeled L and R in gure 6, were chosen for panoramic acquisition. The two chosen location were considered a good setup for our experiments. In fact they would have allowed to nd landmarks on wall Wa and Wb, so guarantee a sucient information for an accurate self-localization in most of the workspace area. However, an acquisition from four locations as represented in gure 6 right-hand bottom-row by points L1, R1, L2, R2, would be more advantageous for a real application since a much bigger area would be covered. A dense sampling of workspace views were collected from location L and R by panning the camera 360 relative to the robot heading. In particular, 72 images were acquired for each location, (i.e. one image each 5 degree rotation). It was decided to consider only the central area of each acquired images in order to discard areas a ected by lens distortion. In particular, by considering the available camera lens it was selected an area of 350x350 pixels around on the image center. The whole image resolution was 512x512 pixels. Successfully we experimented with building panoramic views using cylindrical projections. Figure 3 shows an example of a synthesized panoramic view. The overlapping area between two adjacent images, was 85.7% and each pixel was averaged over 7 di erent images. Because of the dense sampling it is not possible to distinguish among images inside the panoramas. The only visible e ect is that objects appear slightly unfocused and this has the e ect that the image become blurred. This e ect had no apparent consequence in the next computation steps. However, we are going to consider the use of the a median lter since it better preserves edges.

Attention Selection Mechanism Once the panoramic-view was synthesized the proposed attention selection mechanism was run on it, in order to extract candidate landmarks. In a previous work, [5], we have successfully experimented the self-localization algorithm with visual landmarks dimensions ranging from 28x28 to 32x32. Consequently, this range was set as the ideal size for the extracted candidate landmark. A specialized landmark, i.e. textured as the landmark on top of gure 6 left-hand, were put in the proximity of wall Wa, in order to test performance of our attention mechanism. The specialized landmark represents an ideal landmark because of its symmetry and scalability. This leads to the fact that it is very tolerant to scale errors, and can be observed and matched successfully with a large distortion. As a consequence, the proposed selection mechanism should always select this landmark if visible. We run the proposed algorithm for attention selection having edges, corners, intensity values and local symmetries as feature. The implemented algorithm followed the scheme described in gure 4 left-hand. The canny lter was used for edge detection, meanwhile, the canny lter followed by covariance analysis, was used for corner detection. The intensity map was the input image, and for the local symmetry detector an algorithm was designed which searches for similarity along circular and straight lines, as described in [11] and [12]. First, the attention mechanism was tested on each feature at a time. In this case conspicuity map and saliency-map were equal. This was made in order to understand how each single feature a ects the nal selection. Figure 7 shows extracted visual landmarks by running the attention selection mechanism on four selected feature, one at a time. The experiments showed that if only one feature is considered the results is not satisfactory for our purpose. That is, the result is not an invariant and discriminating texture. For example, when local symmetries were considered as feature alone, the attention selection worked ne only for specialized landmarks, (they were selected). However, the response was not satisfactory for other extracted landmarks, since symmetric areas around horizontal edges were selected. A texture pattern only containing horizontal edges is not suitable for the proposed self-localization algorithm since it is not a discriminating pattern. This landmark may in fact match the extracted area as well as the neighbor left and right areas. An example of ambiguous landmarks are landmarks selected near table and blackboard horizontal edges in gure 7 . The interesting issues of this phase of experimentation was to prove that features need to be integrated. It was also important to understand which texture characteristics were enhanced in relation each selected feature. The characteristics enhanced by each feature are reported in gure 7. Many experiments were performed in order to establish appropriate weights for the feature-map in the attention selection procedure. A rst group of runs showed

the advantage of heavily weighting local symmetries in order to always selected the specialized landmarks. However, these runs pointed out the need for an additional feature, that is, vertical edges. These edges are in fact very discriminating for the proposed self-localization algorithm since they constrain the landmark on horizontal lines. The results achieved by including vertical edges, corner, edges, intensity values and circular symmetries as features are showed in gure 8 left-hand for two di erent panorama regions. Figure 8 left-hand shows typical visual landmarks extracted.

Fig. 7.: Extracted visual landmarks by running the attention selection mechanism on four selected feature, one at a time. Top row shows the conspicuity maps. Bottom row shows six selected visual landmarks.

Fig. 8.: Left-hand shows typical visual landmarks extracted, by including vertical edges, corner, intensity values and circular symmetries as features Right-hand shows estimated positions for the landmarks represented in the left-hand pictures.

Stereo Reconstruction The stereo matching procedure was tested for six typical extracted candidate

landmarks. For each landmark its appearance on one of the two stereo views was considered as template to be matched on the other view. The template matching, (normalized cross correlation), succeeded for each considered typical landmark. In particular it was experienced that by using texture patches like most of the extracted visual landmark the normalizes cross-correlation gives reliable results and very few mismatches. However, the estimated landmark position in the image-plane is computed with a noise, as described in [3], and this may lead to errors in the landmark 3D position in the workspace. Another source of error is represented by noise in the estimated panorama pose con guration. Figure 8 right-hand shows estimated positions for the landmarks represented in the left-hand pictures.

Observation-Ares Once that candidate landmark positions was computed it was possible to estimate

landmarks observation areas. An allowed range for landmark texture distortion was established. In particular, the landmark should not be compressed to more than half of its original size for any of its dimensions, (length and height). Below this value, the landmark texture could have lost most of its texture characteristics so that it would have lead to mismatches when compared with observed landmark texture. The landmark should also not be enlarged to more than double of its original size since the e ect of the average lter could lead to texture which are either too smooth or too di erent from the original texture. As a consequence, the landmark can not be observed from a position which is either too close, too far away or with a big angle to a landmark. In order to test how the generated perspective distortion might a ect landmark texture, landmarks were observed from a grid of view point. For each point observed landmarks texture were compared with the correspondent predicted texture. The predicted texture were achieved by re-mapping candidate landmark texture on the camera image-plane, based on the estimated camera pose. Observed and re-mapped landmark textures were compared using the normalized cross-correlation. The comparison was performed for each position on the grid and we noted the grid positions where the matching failed. Figure 6 left-hand showed the computed observation areas for three landmarks.

4 Conclusion This paper proposed a method for autonomous robot navigation with automatic learning of visual landmarks. Visual landmarks represent a useful feature to be exploited for vision-based robot navigation, and do not require the use of any special structured environment. The combined use of visual landmarks, triangulation and optimal triplet selection, can reduce noise when computing robot position. Moreover, as demonstrated with the presented work, the procedure for learning visual landmarks can be automated. A method for a robust acquisition of landmarks was described, and achieved experimental results presented. The experiments con rmed the potentiality and the feasibility of the proposed method. The achieved results would also represent the base of a next step of experimentation. In particular, a next step would be to estimate with more precision the observation areas, and to thoroughly study the uncertainty of the stereo reconstruction method. The interesting aspect of the latter is that the estimated uncertainty in landmark location can be used during self-localization to evaluate the optimal landmark triplet.

References 1. J. Neira, M.I. Ribeiro, and J.D. Tardos. Mobile robot localization and map building using monocular vision. Symposium on Intelligent Robotics Systems (SIRS), 1997. 2. A.J. Davison and D.W. Murray. Mobile robot localization using active vision. European Conference on Computer Vision (ECCV), 1998. 3. C.B. Madsen and C.S. Andersen. Optimal landmark selection for triangulation of robot position. Journal of Robotics and Autonomous Systems, 1998. 4. J.L. Crowley. Mathematical fundation of navigation and perception for an autonomous mobile robot. Reasoning with Uncertainty in Robotics, 1996. 5. S. Livatino and C.B. Madsen. Optimization of robot self-localization accuracy by automatic visual-landmark selection. Scandinavian Conference on Image Analysis (SCIA), 1999. 6. P.E. Trahanias, S. Valissaris, and T. Garavelos. Visual landmak extraction and recognition for autonomous robot navigation. International Conference on Intelligent Robots and System (IROS), 1997. 7. C. Balkenius. Spatial learning with perceptually grounded representations. Robotics and Autonomous System, Special Issue on Autonomous Mobile Robots, 1998. 8. L. McMillan and G. Bishop. Plenoptic modeling: an image-based rendering system. Siggraph95, 1995. 9. S.M. Culhane and J.K. Tsostos. An attentional prototype for early vision. European Conference on Computer Vision (ECCV), 1992. 10. C.S. Andersen. A Framework for Control of a Camera Head. PhD thesis, Aalborg University, Denmark, 1996. 11. O. Hansen. On the use of Local Symmetries in Image Analysis and Computer Vision. PhD thesis, Aalborg University, Denmark, 1992. 12. G.H. Granlund and H. Knutsson. Signal Processing for Computer Vision. Kluwer Academic Publishers, 1995. 13. P. Zingaretti A. Carbonaro. Route following based on adaptive visual landmark matching. Robotics and Autonomous System, Special Issue on Autonomous Mobile Robots, 1998. This article was processed using the TEX macro package with SIRS99 style

Suggest Documents