IEEE TRANSACTIONS ON SMC, PART B, VOL. X, NO. X, NOVEMBER X
1
Qualitative Map Learning Based on Co-Visibility of Objects Takehisa Yairi, Non-member, Koichi Hori, Member, IEEE, and Kosuke Hirama, Non-member
Abstract Autonomous map construction is one of the most fundamental and significant issues in intelligent mobile robot research. While a variety of map construction methods have been proposed, most require some quantitative measurements of the environment and a mechanism of precise self-localization. This paper proposes a novel map construction method using only qualitative information about “how often two objects are observed simultaneously.” This method is based on heuristics - “closely located objects are likely to be seen simultaneously more often than distant objects” and a well-known multivariate data analysis technique - Multi-Dimensional Scaling (MDS). A significant feature of this method is that it requires neither quantitative sensor measurements nor information about the robot’s own position. Simulation and experimental results demonstrated that this method is sufficiently practical for capturing a qualitative spatial relationship among identifiable landmark objects rapidly. Index Terms Mobile robot, map building, qualitative information, co-visibility
I. I NTRODUCTION A map is a spatial model of the environment and is indispensable for a mobile robot to reasonably perform a variety of tasks, such as path planning, self-localization, information exchange with other robots, etc. Therefore, the problem of mobile robot map building in unknown environments is a central issue in intelligent robotics research. As a result, a wide range of map building methods have been developed, Manuscript received April 17, 2004; revised October 24, 2004. This work was supported in part by the Tateishi Science and Technology Foundation. T. Yairi and K. Hori are with the Research Center For Advanced Science and Technology, University of Tokyo. (e-mail:
[email protected]) March 27, 2005
DRAFT
IEEE TRANSACTIONS ON SMC, PART B, VOL. X, NO. X, NOVEMBER X
2
depending on the tasks required for the robot (e.g., path planning, self-localization, cooperative tasks, etc.), the environment (e.g., indoor, corridor, planetary surface, etc.), sensors (e.g., sonar, laser range finder, vision, etc.), and actuators (e.g., wheels, legs, e.g.,). Landmark-based or feature-based map learning, in which the positions of identifiable objects in the environment are estimated from a series of observation data, is representative of these methods [1], [2], [3], [4], [5], [6], [7]. The common approach in this methodology is to build quantitatively accurate maps based on quantitative information on the robot’s own position and relative distances and directions to external landmark objects. Our main motivation for this study came from several simple but fundamental questions regarding this conventional approach: •
Are quantitative sensor measurements necessary for building maps?
•
Is self-localization necessary for building maps?
•
Is a quantitatively accurate map necessary for a mobile robot?
It seems true that humans can grasp approximate spatial relationships among objects without quantitatively accurate measurement of self-position and relative distances, and we are able to infer the positions both of ourselves and of other objects even with numerically inaccurate approximate maps. This raises the question of why a robot cannot also build and utilize a map in a similar way. In this paper, we attempt to answer the first two of the above questions by proposing a map learning method for mobile robots, which builds approximate object-based maps using only qualitative and very coarse observation information. Specifically, we consider co-visibility information, i.e., information about whether two objects are visible together. In the proposed method, co-visibility frequencies for all pairs of landmark objects are computed from the qualitative observation information, then empirical distances among them are estimated based on a heuristic rule - “if two objects are often seen together, then they are likely to be located close to each other. ” Then, a qualitative landmark-based map is constructed by applying multi-dimensional scaling (MDS) to the matrix of empirical distances. This can be viewed as a unique map building method that utilizes heuristics regarding the relationships between temporal proximities of object observations and spatial distances among the object locations, along with a constraint on the object configurations in a 2-D Euclidean space. This method has three features that are remarkable from a practical viewpoint: •
It does not require the robots to localize their own positions during the map building process. That is, it decouples the two problems of map building and self-localization.
March 27, 2005
DRAFT
IEEE TRANSACTIONS ON SMC, PART B, VOL. X, NO. X, NOVEMBER X
•
3
It does not require quantitative sensor measurements such as relative distances and directions to external objects, but uses only qualitative information of co-visibility or information about “what objects were seen together.”
•
Despite using only limited information, it is able to learn qualitatively very accurate object maps, and is robust against observation errors.
The rest of this paper is organized as follows. In section II, we review and classify the conventional map building methods and clarify the standpoint of our approach. In section III, we describe assumptions regarding the robots, environments, and map building tasks. In section IV, we explain the proposed map learning method based on co-visibility of objects. In sections V and VI, we present simulation and experimental results, respectively, and evaluate the performance of the method. In section VII, we discuss the validity of our assumptions, current limitations, and possible enhancements of the method. Finally, in section VIII, we present some concluding remarks and future directions for further study. II. R ELATED W ORK A. Conventional Map Learning Methods The problem of map building in unknown environments has long been a central theme in intelligent mobile robot research, and a number of methods have been proposed to date. These map learning methods are often classified into metric and topological methods based on the manner of map representation [8], [9], [10]. The metric methods (e.g., [11], [3]) generally attempt to capture and represent geometric features of the environment, such as object positions, sizes, and shapes quantitatively or numerically. On the other hand, the topological methods (e.g., [12], [13]) attempt to represent qualitative spatial relationships among “distinctive” places in the environment by graph or network structures. However, for a number of reasons, it is not always appropriate to simply classify all of the map building methods into the two categories. First, although both grid-based mapping (e.g., [11], [14]) and landmarkbased mapping (e.g., [1], [2]) are usually regarded as metric methods, they are fundamentally different in their purposes: the former attempts to determine which part of the environment is occupied with some objects but does not concern itself with what each object is, whereas the latter attempts to identify all objects in the environment and estimate their positions individually. Second, while most of the landmarkbased map building methods aim at quantitative accuracy or how accurate the absolute coordinates of each object are, others focus on the qualitative accuracy or how accurately the qualitative spatial relationships among the object locations are preserved in the maps [15]. This is also the case with graph-based maps. While most aim to represent qualitative relationships among places, such as reachability from one March 27, 2005
DRAFT
IEEE TRANSACTIONS ON SMC, PART B, VOL. X, NO. X, NOVEMBER X
4
place to another by graph structures, some graph-based methods emphasize quantitative properties, such as relative distances and directions among places [16], [17], [18]. In addition to the ambiguity in the traditional metric/topological classification, some hybrid approaches that explicitly combine the different methods of map representation have emerged recently [19], [20]. Therefore, in the rest of this section, we attempt to classify the various map building methods in more detail than the traditional binary metric/topological categorization, which will differentiate our method from those reported previously. 1) Classification by Map Representation: The most natural way of classifying map building methods is that based on differences in map representation. In the most general sense, a map can refer to any spatial model of the environment, and varies in the entities and properties extracted from the environment. Therefore, all map building methods can be classified in terms of “what” the maps represent and “how” this is done. With regard to the “what” dimension, the grid-based map or occupancy grid map [11], [21], [14], [22] is considered the most basic map representation. In the grid-based map, the environment is divided into a set of small grid regions, each of which is assigned a number between 0 and 1, which indicates the probability that the region is occupied by some object. An important character of the grid-based map is that it does not distinguish between individual objects. In other words, the objects on the map do not have IDs. In contrast, feature-based or landmark-based maps [1], [2], [23], [15], [4] represent the positions of identified objects. While most of the landmark-based mapping methods assume that the landmark objects or features have been defined beforehand, some methods utilize visual features that are extracted autonomously from visual data [6]. Another typical map representation is the graph-based map or the so-called topological map [24], [16], [12], [25], [26], [27], [13], [17], [28], [18]. In the graph-based map, some important or “distinctive” places in the environment and transition paths among them are represented by a graph structure consisting of nodes and edges. Naturally, there are also various combinations of these map representation methods. For example, in [3], [7], [29], grid-based maps were learned first from range sensors, and then converted into feature-based maps by extracting and labeling feature objects autonomously. Similarly, in [19], a grid-based map was learned first, and then a topological map was constructed by partitioning the obtained grid-based map into a set of coherent regions. Furthermore, Kuipers [20] proposed a general framework integrating the various map representations mentioned above. With regard to the “how” dimension, these map representations can be classified in terms of whether they represent qualitative or quantitative aspects of the environment. For example, while most of the landmark-based mapping methods emphasize on numerical accuracy of the estimated object positions March 27, 2005
DRAFT
IEEE TRANSACTIONS ON SMC, PART B, VOL. X, NO. X, NOVEMBER X
5
TABLE I C LASSIFICATION OF MAP BUILDING METHODS BY MAP REPRESENTATION
What is represented on the map ?
Qualitative Quantitative
How is it represented ?
Obstacle existence
Object location Qualitative Object-based
Distinctive places and paths Qualitative Graph-based
Quantitative Object-based
Quantitative Graph-based
- Sogo [15] CoviMap[Ours]
Grid-based - Moravec [11] - Elfes [21] - Lu [30] etc.
- Uhlman [2] - Feder [4] etc.
- Mataric [12] - Koenig [27] - Shatkay [13] etc.
- Engelson [16] - Golfarelli [17] - Duckett [18] etc.
[1], [2], some others focus on how correctly the qualitative spatial relationships among the objects are represented [15]. Similarly, most of the graph-based maps focus on qualitative relationships such as connectivity between the distinctive places [12], [26], whereas others focus on quantitative relationships, such as distance and direction [16], [18]. Table I shows the classification of the map learning methods based on these two dimensions. As can be seen from this, our co-visibility-based map learning represents the qualitative spatial relationships between landmark objects. 2) Classification by Information Used for Mapping: Another important factor characterizing robot map building methods is the type of information used to build the map. Generally, existing map building methods utilize a variety of information sources or sensors ranging from proprioceptive sensors, such as gyros and encoders, to exteroceptive sensors, such as sonar and vision, and positional sensors, such as GPS and beacons. As there are an infinite number of ways of combining these low level information sources, we attempt here to classify the map building methods from a slightly more abstract viewpoints, i.e., (1) characteristics (whether qualitative or quantitative) of the information used mainly for map building, and (2) the method of acquiring and utilizing the robot’s own position. First, map building methods can be classified based on whether they mainly use qualitative or quantitative information. For example, while most of the grid-based methods (e.g., [11]) and landmark-based methods (e.g., [1]) principally use quantitative information, such as relative distance and direction to external objects to build maps, some early graph-based methods (e.g., [12]) mainly use qualitative
March 27, 2005
DRAFT
IEEE TRANSACTIONS ON SMC, PART B, VOL. X, NO. X, NOVEMBER X
6
TABLE II C LASSIFICATION OF MAP BUILDING METHODS BY TYPE OF INFORMATION
Qualitative Quantitative
Charasteristics of used information
Self-localization External / Deadreckoning Classical Topological - Kuipers [24] - Mataric [12] etc.
Classical Metric - Moravec [11] - Borenstein [14] etc.
Concurrent estimation
No self-localization
SLAM / CML - Thrun [23] - Castellanos [3] - Feder [4] etc.
CoviMap [ours]
Local Map Alignment
- Duckett [18] - Lu [30] etc.
information regarding the order of the places visited to estimate graph structures. A unique map building method based on the qualitative information was reported in [15], in which qualitative information of “how object positions are divided into two sets with respect to arbitrary straight lines” is used to construct an object map by propagating “three point” constraints. In our method, very coarse qualitative observation information, designated “co-visibility” or information regarding whether two objects are visible together is mainly used to build an object map. As mentioned previously, another dimension of classification based on the information used for mapping is the method of acquiring and using the positional information of the robots themselves. In early work in both metric and topological methods [11], [22], [12], it was usually assumed that accurate robot positions could be obtained from some external sources or simple dead-reckoning procedures. In contrast, modern map learning frameworks called SLAM (Simultaneous Localization and Mapping) or CML (Concurrent Mapping and Localization) build an environmental map along with estimation of the robot’s position concurrently and iteratively [4], [3], [23]. There is another approach to this problem [30], [18], which estimates consistent relative spatial relationships between observation positions based on the dead-reckoning information and observation data. Unlike these conventional approaches, our map building method does not require the robot’s own positions, i.e., it does not assume that the positional information is provided from some external sources or is estimated by the robot itself. Table II shows the classification of map building methods based on the two axes described above.
March 27, 2005
DRAFT
IEEE TRANSACTIONS ON SMC, PART B, VOL. X, NO. X, NOVEMBER X
7
B. Qualitative Navigation and Mapping The main purpose of our map building method is to obtain a qualitative model representing relative positional relationships between landmark objects from the co-visibility information. A fundamental question about such a qualitative mapping method is how a mobile robot can utilize this type of map to navigate through the environment, or more specifically, how the robot can localize itself and plan a path to a given goal using the map. Qualitative Navigation [31], [32], which is a part of more general framework of Qualitative Spatial Reasoning (QSR) [33], deals with this topic. With this methodology, mobile robots are expected to be able to estimate their own positions and infer paths to some goals qualitatively by matching qualitative observations about the relative spatial relationships among the landmarks with the qualitative maps. Moreover, such a qualitative navigation method can be used to make an approximate global plan quickly, in advance of making a detailed path plan with a metric map. A significant problem related to this methodology is how to construct such a qualitative map, especially from qualitative observation data. As already mentioned, Sogo [15] proposed an interesting approach to this problem. In their method, qualitative positions of landmark objects are learned from qualitative information about motion directions of the objects observed by a moving robot. In this paper, we also propose another qualitative landmark-based map building method using only qualitative observation. A major difference between our method and that proposed by Sogo [15] is that ours uses more primitive and coarse qualitative information about whether two objects are visible together from a robot position. It also differs from the existing qualitative mapping method in that the process of producing a qualitative map from qualitative observation data itself is based on a quantitative (or numerical) technique called Multi-dimensional scaling (MDS). III. P ROBLEM D EFINITION AND A SSUMPTIONS We consider a map building task by a mobile robot, in which the robot estimates the positions of objects in the environment by repeated explorations and observations (Figure 1). More specifically, we make the following assumptions about the environment and the robot. A. Environment and Landmark Objects First, we consider the environment as a closed area containing a finite number of objects. Each object is assigned a unique ID and can be identified by its appearance. In addition, it is assumed that all objects are about the same size, and are not located too close to one another. A room containing various medium-
March 27, 2005
DRAFT
IEEE TRANSACTIONS ON SMC, PART B, VOL. X, NO. X, NOVEMBER X
8
Observation and Map Building Process
Estimate #1, #3, #4, ..... #2, #3, #5, .....
Raw Camera Image
Stored Observation Data
Map
Objects Observation Move
Environment
Fig. 1.
Map building task for a mobile robot (move, observation and map estimation)
sized objects, such as desks, sofas, and shelves is an example of this kind of environment, whereas a corridor composed of similar and view-blocking walls is not.
B. Map To Be Built As mentioned above, the robot builds a “qualitative” object map, i.e., a two-dimensional map representing the locations of objects in the environment. “Qualitative” means that the map is not necessarily accurate in terms of the quantitative measurements, such as the absolute positional errors of the objects, but it still preserves the correct qualitative spatial relationships among them such as “Object A is next to object B” and “Seen from object A, C is to the right of B.” In the next section, we will explain how the qualitative correctness of obtained maps can be evaluated. C. Perception It is assumed that the robot obtains a panoramic image as its sensory input or raw information source at each location visited. A panoramic image may be obtained directly by a wide-angle camera (typically, an omni-directional vision sensor such as [34]) or may be synthesized from a set of narrower images in different directions taken at a location. Each panoramic image is then processed by some image recognition technique and transformed into a list of identified objects at the point. The resultant lists of visible objects at various locations are then used to build a map. It should be noted that this information contains no quantitative measurements, such as distances and directions to the objects.
March 27, 2005
DRAFT
IEEE TRANSACTIONS ON SMC, PART B, VOL. X, NO. X, NOVEMBER X
9
From a practical point of view, the influence of recognition errors on the quality of maps is a significant problem, because all existing object recognition methods are subject to error. In a later simulation, we will examine how our map learning method is affected by recognition errors. D. Visibility of Objects We make the weak assumption that it becomes more difficult to recognize an object as its distance from the robot increases. Although the degree of validity of this assumption is dependent on the environment, objects, and recognition algorithm, we consider it to be roughly appropriate for the following reasons: •
As the distance between an object and the robot increases, the image size of the object in the scene becomes smaller and the quantization error makes the shape and pattern unclear.
•
As the distance increases, the probability that the object is occluded by other objects from the robot’s position also increases.
We will discuss this issue later in section VII-A. E. Co-visibility and Distance Between Objects Based on the above assumption, we consider the heuristic rule, “If two objects are observed together frequently, then they are likely to be located closely.” This can also be rephrased as “the probability that two arbitrary objects are visible together decreases monotonically as the spatial distance between them increases.” Moreover, this rule can be regarded as a special case of a more general heuristic rule – “the temporal and spatial proximities are approximately equivalent,” which is considered to be used implicitly by humans in their daily lives. Although it is difficult to prove this rule strictly in general cases, it is relatively easy to show its validity in a restricted environment as follows (Figure2). First, we assume that the probability that an object located at a distance r from a robot is visible equals: Pvis (r) = e−a·r
2
(1)
where a is a positive constant. Now, let two objects A and B be located at (− 2l , 0) and ( 2l , 0), respectively in a plane. Then, the probability that both A and B are visible from a robot position X (x, y) can be calculated as: Pcovis (l, x, y)
s
s
(2)
l l = Pvis ( (x + )2 + y 2 ) · Pvis ( (x − )2 + y 2 ) 2 2 = e− March 27, 2005
al2 2
2
· e−2a(x
+y 2 )
DRAFT
IEEE TRANSACTIONS ON SMC, PART B, VOL. X, NO. X, NOVEMBER X
y
Observation Point
10
X
(x,y)
Probability that two objects A,B are visible from X at the same time:
Assumption Probability that an object is visible from distance r:
Pcovis (l,x,y) = Pvis(AX) Pvis(BX) - al 2
Pvis (r) = e
2
2
e- 2a(x +y )
e2
=
- ar 2
Expected area of region where two objects are visible
SA,B = = l 2
Pcovis (l,x,y) dxdy π 2a
e-
al 2 2
Monotonic decreasing in terms of l
l 2
Object A
Object B
x
l
Fig. 2.
A simplified model of co-visibility
Therefore, if the environment is sufficiently large, the expected area of the region where the two objects are visible simultaneously can be calculated as: SA,B (l) = e− = e− =
al2 2
al2 2
Z
·
∞
Z
∞
e−2a(x
−∞ −∞ Z ∞ −2ax2
π − al2 e 2 2a
e
−∞
2
dx ·
+y 2 )
Z
dxdy
∞ −∞
(3) 2
e−2ay dy
The probability that the two objects are observed at the same time is considered to be approximately proportional to this value, provided that the robot moves around the environment randomly for a sufficiently long time. As a result, the covisibility probability of two objects decreases monotonically as the distance l between them increases. F. Exploration The robot moves around the environment to make observations at various locations. It is desirable that every area in the environment should be explored equally by the robot. However, such an ideal exploration is practically impossible because the actual motion of the robot is determined by its path planning algorithm and is restricted by the presence of obstacles. In the simulation, we employ a relatively simple movement algorithm that combines a probabilistic selection of moving directions with an obstacle avoidance behavior, and examine how the quality of maps is affected by the non-uniformity of exploration. March 27, 2005
DRAFT
IEEE TRANSACTIONS ON SMC, PART B, VOL. X, NO. X, NOVEMBER X
11
ni ← 0 , ni,j ← 0 Move to an observation point Compute co-visibility frequency
fi,j ← ni,j
Obtain panoramic image
Compute empirical distance
Recognize objects in image Λo ← {All visible objects}
2
δ i,j ← φ ( fi,j )
Obtain coodinates of all objects by MDS
ni ←ni +1 (for each object in Λo) ni,j ←ni,j +1 (for each pair in Λo) N
Enough data ?
ni + nj - ni,j
Map
Y
Fig. 3.
Flow of CoviMap algorithm
IV. P ROPOSED M ETHOD A. Outline of Co-visibility-based Mapping In this section, we introduce CoviMap – a map learning method based on the co-visibility of objects. In this method, the mobile robot estimates approximate positions of N objects in the environment based on their co-visibility frequencies, which are updated repeatedly by exploration and observation. The outline of CoviMap is illustrated in Figure3, and is described below: 1) The robot repeats the following behavior steps and updates the number of times each object is observed (ni ), and the number of times each pair of objects is observed together (ni,j ). a) The robot moves to the next observation position based on a given motion control rule, avoiding collisions with the objects. b) It obtains a list of visible objects Λo from a panoramic image captured at the current position, and updates ni and ni,j as below: ni ← ni + 1 ni,j ← ni,j + 1
(for each object i in Λo ) (for each pair of i, j in Λo
2) After a specified number of steps, co-visibility frequency fi,j is computed for each pair of objects based on ni , nj , and ni,j . 2 of each pair is computed from f 3) Then, squared empirical distance δi,j i,j by the empirical distance
function φ.
March 27, 2005
DRAFT
IEEE TRANSACTIONS ON SMC, PART B, VOL. X, NO. X, NOVEMBER X
12
4) The robot obtains the estimated positions of all objects (x ˆ1 , · · · , x ˆN ) by applying Multi-Dimensional 2 . Scaling (MDS) to the distance matrix ∆ whose (i, j) element is δi,j
In the remainder of this section, we explain the details of the last three parts described above. We also explain the methods used to evaluate the obtained maps qualitatively. B. Computation of Co-visibility Frequency and Empirical Distance First, the co-visibility frequency fi,j between two objects i and j is defined as follows: fi,j =
ni,j ni + nj − ni,j
(4)
where fi,j is the conditional probability that two objects are visible at the same time, given that at least one of them is visible. From the definition, it is obvious that fi,j = fj,i and 0 ≤ fi,j ≤ 1. This definition of fi,j is also known as Jaccard’s coefficient. Next, we consider the heuristic rule, which was introduced in the previous section.– “Closely located objects are likely to be seen simultaneously more often than distant objects and vice versa. ” With the definition of co-visibility frequency fi,j , the heuristics can be rewritten as “Distance between two objects di,j decreases monotonically as fi,j increases.” Figure 4 (scattered points) illustrates the actual relationship
between the real (squared) distance and co-visibility frequency fi,j in the simulation environment in section V. The result indicates that the heuristics are approximately appropriate. 2 between an arbitrary pair of Therefore, we introduce the notion of squared empirical distance δi,j
objects, which is defined by a monotonic decreasing non-negative function, φ: 2 δi,j = φ(fi,j )
(i 6= j)
(5)
We call φ the empirical distance function. When two objects are identical (i = j ), the empirical distance is defined as zero: i.e., δi,i = 0 We also define an empirical distance matrix ∆ whose (i, j) element equals 2 . Matrix ∆ is an N -dimensional symmetric matrix (i.e., δ 2 = δ 2 ), and its diagonal component to δi,j i,j j,i 2 ) equals to zero. (δi,i
C. Map Construction Based on Multi-Dimensional Scaling (MDS) Multi-dimensional scaling (MDS) is a multivariate data analysis technique used to visualize a potentially high-dimensional data structure by mapping it into a relatively low-dimensional space [35], [36]. The fundamental purpose of MDS is to find an optimal configuration of objects in a low-dimensional space, when dissimilarities or distances among them are given. In CoviMap, MDS is utilized to obtain a configuration or a set of 2-D positions of objects from the set of empirical distances among them. March 27, 2005
DRAFT
13
3
2
2
Squared Distance (d i,j ) of Two Objects
IEEE TRANSACTIONS ON SMC, PART B, VOL. X, NO. X, NOVEMBER X
1
0 0
0.2
0.4
0.6
Co−visibility Frequency ( fi,j ) of Two Objects
Fig. 4.
Relationship between squared distance d2i,j and co-visibility frequency fi,j of objects in the simulation environment
While there are many kinds of MDS methods [35], [36], the current version of CoviMap employs classical scaling [37] which is one of the most basic metric MDS methods. The procedure of map building based on the classical scaling method is summarized as follows. For simplicity, we assume that the center of gravity of all the objects coincides with the origin of coordinates. First, we apply double centering to the empirical distance matrix ∆ as: B ∆ = − 21 H∆H , where H is the centering matrix which is defined as: H = I − 1 · 1T /N . Here, I is an N-dimensional elementary matrix, and 1 is an N-dimensional vector all elements of which equal to 1. B ∆ can be regarded as an estimate of the scalar product matrix B which is defined as: B = XX T , where X = [x1 , · · · , xN ]T is
the coordinate matrix of the objects. Next, we compute the eigendecomposition of B ∆ as: B ∆ = V ΛV T ,
where Λ is a diagonal matrix whose diagonal elements are the eigenvalues of B ∆ , and V is a matrix whose columns are the eigenvectors. Now we consider a two-dimensional diagonal matrix Λ+ whose diagonal elements are the first two eigenvalues λ1 and λ2 , and an N × 2 matrix V + whose columns are
ˆ is given by the corresponding two eigenvectors v 1 and v 2 . Then, an estimate of the coordinate matrix X 1 √ √ ˆ = V + Λ+2 . That is to say, the estimated position of i-th object is given by x X ˆi = [ λ1 v1,i , λ2 v2,i ]T .
D. A Few Remarks on the Empirical Distance Function and MDS In the proposed map building method using MDS described above, three practical issues remain to be resolved. The first problem is how to decide on a proper empirical distance function φ. As the only constraint imposed on φ was that it must be a non-negative monotonically decreasing function, there are an infinite March 27, 2005
DRAFT
IEEE TRANSACTIONS ON SMC, PART B, VOL. X, NO. X, NOVEMBER X
14
number of candidates. Generally, the optimality of the choice of φ is considered to be dependent on a variety of factors ranging from the properties of the environment and objects within it, to the methods of image processing and object recognition used by the robot itself. If actual data, such as those shown in Figure 4, are available, the function can be readily determined by function approximation techniques, such as least squares fitting. In most cases, however, such data are not available. One possible approach in this case is to select the one that minimizes the stress value of the MDS results. The second problem is that φ takes a certain constant value φ(0) for every pair of objects whose co-visibility frequency fi,j is zero. Our solution to this is to modify the empirical distance δi,j for each pair whose fi,j is very small by using the triangular inequality as a constraint, i.e., δi,j is modified by the following equation: δi,j ← min(δi,k + δk,j ) k
(6)
Simulation results in a later section show that this constraint is very effective in some cases. The last problem is that a map obtained by MDS may be a “mirror image”by 50%, because the distance between an arbitrary pair of objects is unchanged even if the map is turned over. In actual situations, however, this will not be very problematic, because we can easily detect and correct such mirror images only if the orientation (i.e., clockwise or anti-clockwise) of a triangle is provided. E. Evaluation Criteria for Qualitative Maps To evaluate the qualitative accuracy of the obtained maps, we employ the following two methods of measuring differences between constructed and real maps. 1) Evaluation by Delaunay Graph Comparison: One way of evaluating the accuracy of a constructed map qualitatively is to compare its Delaunay graph with that of the real map (Figure 5). The details of this criterion can be described as follows. First, Delaunay triangulation is applied to both the real map and the constructed map, regarding the objects on the maps as vertices. Consequently, we obtain two undirected graphs called Delaunay graphs. Then, we define the Delaunay error by the following equation: Errdel ≡
Dif fdel 3·N −6
(7)
where, N and Dif fdel are the number of objects and the number of inconsistent edges between the two graphs, respectively. It should be noted that 3 · N − 6 is the maximum number of edges generated by triangulating N points. Dif fdel can be easily computed by counting the number of different elements between adjacent matrices corresponding to the two graphs. March 27, 2005
DRAFT
IEEE TRANSACTIONS ON SMC, PART B, VOL. X, NO. X, NOVEMBER X 6
2
1 0
Real Config.
8
4
Number of Inconsistent Edges
5
9 7
6
2
Err del ≡ Diff del 3⋅N- 6
4
0
5
9
= 2 = 0 .083 24
Delaunay Triangulation
3
Definition of Delaunay graph error Errdel
1
Real Config.
2
4
0 3
Obtained Map
Compute Triangle Orientations [0,1,2] = [0,1,3] = [0,1,4] = [0,2,3] = [0,2,4] = + [0,3,4] = +
2
1 4 0 3
Fig. 6.
Diff del = 2
Delaunay Graph Error
8
7
Fig. 5.
Delaunay Triangulation
3
1
Obtained Map
15
[0,1,2] = [0,1,3] = [0,1,4] = [0,2,3] = [0,2,4] = [0,3,4] = +
Number of Differences in Orrientations Diff ori = 1
Triangle Orientation Error Err ori ≡
Diff ori NC 3
1 = = 10 0 . 10
: anti-clockwise : clockwise
Definition of triangle orientation error Errori
This criterion is reasonable when the robot uses the map for path planning based on Voronoi diagrams, because Delaunay triangulation is the dual of the Voronoi diagram [38], which means the Voronoi diagram of the obtained map coincides with that of the real object configuration if and only if Errdel = 0. In this work, we used Bowyer-Watson algorithm[39], [40] to compute Delaunay triangulations. This incremental method starts from a supertriangle that encloses all the points, then inserts new points one by one and updates the triangulation until all the points are added. 2) Evaluation by Triangle Orientations: Another method to evaluate the qualitative correctness of an obtained map is to count the number of triangles whose orientations (clockwise or anti-clockwise) are consistent with the real map (Figure 6) [32], [15]. The orientation of a triangle [p1 p2 p3 ] is defined as + when the three vertices (p1 , p2 , p3 ) make an anti-clockwise turn in this order, and − otherwise (i.e., when they make a clockwise turn). As there are
N C3
triangles in total, triangle orientation error, Errori or the percentage of triangles with wrong
March 27, 2005
DRAFT
IEEE TRANSACTIONS ON SMC, PART B, VOL. X, NO. X, NOVEMBER X
16
orientations in the constructed map can be defined as follows: Errori ≡
(Number of triangles with different orientation) N C3
(8)
This criterion is considered to be reasonable, especially when we use an obtained map for qualitative navigation [31], [32], in which the order of visible landmark objects at each position is used as the major information source. V. S IMULATION S TUDY A. Overview We performed a simulation study using Cyberbotics’ WEBOTS Simulator (ver.2.0). This study consists of seven experiments (Sim 1 ∼ Sim 7) which explore various aspects of the proposed method. The purpose of each experiment is described as follows. Sim 1 (section 5.2) examines basic characteristics of the method and compares different ways of computing the empirical distances. Sim 2 (section 5.3) focuses on environmental factors such as the number of objects and size of environment area. Sim 3 (section 5.4) highlights the influence of robot motion parameters such as speed and randomness of direction changes on the resultant maps. Sim 4 (section 5.5) looks at the relationship between uniformity of environment exploration and map accuracy. Sim 5 (section 5.6) evaluates performance degradation by observation noise or object recognition errors. Sim 6 (section 5.7) investigates how the camera’s angle of view affects the results. Sim 7 (section 5.8) compares different models of object recognition. Table III shows the parameter settings of environment, motion, observation, recognition and map building in these experiments. In the rest of this subsection, we explain the experiment parameters in detail. 1) Environment and Objects: In this simulation study, we consider square environments of different sizes containing a finite number of objects. The side length of the square field and the number of objects vary from 1.0[m] to 3.0[m] and from 5 to 30, respectively. The objects are cylinder-shaped and of the same size (height: 160[mm], diameter: 48[mm]). Each object has a unique ID number such as ob0 , ob1 , · · · obN −1 . In each experiment, we prepare 5 or 10 different layout patterns, and average the results. For
example, Figure 7 illustrates 4 different layout patterns of the environment with 10 objects and 1.5[m] side length. The dotted lines in the figures represent the edges of Delaunay triangulations. In section V-C (Sim 2), we will discuss the influence of the number of objects and the size of field on the performance of the map building.
March 27, 2005
DRAFT
IEEE TRANSACTIONS ON SMC, PART B, VOL. X, NO. X, NOVEMBER X
17
TABLE III PARAMETER SETTINGS IN SIMULATION STUDY
Sim Sim Sim Sim Sim Sim Sim Sim
1 2-A 2-B 3-A 3-A 4 5-A,B 6-A,B,C,7
Sim Sim Sim Sim Sim Sim Sim
1,2,3,4 5-A 5-B 6 7-A 7-B 7-C
Mapping Parameters Type of Triangle Distance Inequality Function Constraint inv/log Y/N log Y log Y log Y log Y log Y log Y log Y
Environment Parameters Number of Size of Objects Environment [m×m] 10 1.52 5 ∼ 30 1.52 2 15 1.0 ∼ 3.02 15 1.52 15 1.52 10 1.52 15 1.52 30 1.52
Sensing Parameters Non-recog. Mis-recog. Camera Error Rate Error Rate Viewangle [%] [%] [deg] 0.0 0.0 360 0.0 ∼ 20.0 0.0 360 0.0 0.0 ∼ 10.0 360 0.0 0.0 90 ∼ 360 0.0 0.0 360 0.0 0.0 360 0.0 0.0 360
0.75
Object Visibility Threshold [deg] 5.0 5.0 5.0 5.0 4.0 ∼ 8.0 4.0 ∼ 8.0 5.0
Visibility Parameters Identical or Steepness of different Visibility Threshold Function Identical ∞ Identical ∞ Identical ∞ Identical ∞ Identical ∞ Different ∞ Identical 50.0 ∼ ∞
0.75 ob8
ob1
ob5
ob1
ob3
ob6
ob7
ob9
ob0
ob9
ob8
ob7 0
Motion Parameters Range of Distance Direction per Step Change [deg] [mm] ±45 100 ±45 100 ±45 100 ±0, ±45, ±90 100 ±45 50 ∼ 300 ±45 100 ±45 100 ±45 100
ob6
0
ob4
ob2
ob3 ob5
ob4
ob0 −0.75 −0.75
0
0.75
−0.75 −0.75
(a) 0.75
ob2
0
0.75
(b) 0.75
ob6
ob6
ob2
ob1
ob0 0
ob4
ob8
ob4
ob5 ob9
0
ob9
ob3
ob7
ob0 ob8
ob5 ob2 ob7
ob1
ob3 −0.75 −0.75
0
(c)
0.75
−0.75 −0.75
0
0.75
(d)
Fig. 7. Four examples of layout patterns of the environment containing 10 objects (dotted lines: edges of Delaunay triangulation)
March 27, 2005
DRAFT
IEEE TRANSACTIONS ON SMC, PART B, VOL. X, NO. X, NOVEMBER X
18
2) Perception and Object Recognition: The robot has a camera with a resolution of 160 × 120 pixels and a horizontal angle of view of 80 degrees. In the experiments except Sim 6, we assume that it has a “virtual” angle of view of 360 degrees which is obtained by rotating and capturing several images in different directions at each observation site. In Sim 6 (section 5.7), we examine the cases in which the angle of view is smaller than 360 degrees. For simplicity, in all the experiments except Sim 7, we assume that the robot can recognize an object if its horizontal visual angle excluding parts occluded by other objects is larger than the recognition threshold θvis fixed at 5.0 degrees (0.0873 radian). This means that an object nearer than about 550 mm from the robot is recognizable as long as it is not occluded. In Sim 7 (section 5.8), on the other hand, we assume more realistic object recognition models and investigate the influence on the method. Another issue related to the perception is the object recognition errors due to observation noise. In Sim 5 (section V-F), we will discuss the influences of two types of recognition errors, i.e., non-recognition and misrecognition errors on the resultant map accuracy. 3) Exploration and Motion Control Strategy: In this study, the robot was assumed to explore the environment using a simple motion control algorithm. At each position, the robot chooses the next direction randomly within the range of ±θr relative to the current heading direction, and proceeds lr in the direction unless it encounters an obstacle or wall, where θr and lr are the parameters in this motion control rule. In all experiments except Sim 3, we set those parameters as θr = 45[deg],lr = 100[mm]. In section V-D (Sim 3), on the other hand, we examine how the differences in motion strategies influence the map building process by comparing various combinations of the parameter values. The robot has 8 proximity sensors to detect nearby obstacles around it (i.e., objects and walls in this case), and avoids collisions with them. 4) Empirical Distance Function and Triangular Inequality Constraint: With regard to the empirical distance function (EDF) φ, we prepared the following two non-negative monotonically decreasing functions – φ1 and φ2 :
1 2 2 − 1), δ1,u ), δ1,l ) fi,j
(9)
2 2 φ2 (fi,j ) = max(min(−α2 · loge fi,j , δ2,u ), δ2,l )
(10)
φ1 (fi,j ) = max(min(α1 · (
2 , δ2 , δ2 , δ2 where it is assumed that each of the parameters in the functions α1 , α2 , δ1,l 1,u 2,l 2,u takes a 2 < δ 2 , δ 2 < δ 2 . Figure 8 shows the shapes of the empirical distance functions. positive value, and δ1,l 1,u 2,l 2,u
Although the optimum values of the parameters in the functions are considered to be dependent on the March 27, 2005
DRAFT
IEEE TRANSACTIONS ON SMC, PART B, VOL. X, NO. X, NOVEMBER X δ 1,max2
1
δ 1,min
0 0
1
δ 2,min Frequency
f i,j
1
(a)
Fig. 8.
φ 2 (f i,j)
d’i,j2 Empirical Distance
Empirical Distance
δ 2,max2
φ 1 (f i,j)
d’i,j2
19
0 0
Frequency
f i,j
1
(b)
Empirical distance functions (a) φ1 and (b) φ2 used in the simulations and experiments
environmental factors and perception / recognition capability of the robot, we used a specific set of 2 = δ 2 = 0.1, δ 2 = δ 2 = 2.0 through this simulation study to obtain values as α1 = 0.05, α2 = 0.3, δ1,l 1,u 2,u 2,l
consistent results. These values were manually chosen by trial and error so that good results are obtained in preliminary experiments. In Sim 1 (section V-B), we compare these two types of empirical distance functions. Aside from the empirical distance functions, triangle inequality constraint (TIC) can be used to estimate approximate distance between a pair of objects that are rarely observed together. In Sim 1, we also examine the effect of TIC. In all other experiments, we employ the combination of empirical distance function φ2 and TIC based on the result of Sim 1.
B. (Sim 1) : Comparison Among Different Combinations of Distance Functions and Constraint First, we examined the basic characteristics of the proposed method and how the selection of a distance estimation scheme makes a difference in the resultant map quality. Specifically, we compared 4 patterns which were generated by the two types of empirical distance functions (φ1 , φ2 ) and choice of with or without TIC (triangle inequality constraint). The other experiment parameters are set to fixed values as shown in Table III. For each simulation run, the robot performs 1000 steps of observation and exploration, during which it constructs maps at 20,50,100,200,· · ·,900,1000 steps. Figure 9 shows how the average qualitative map errors (Errdel ,Errori ) change as the number of steps increases. In each case, the results of 10 simulation runs with different environment layout patterns are averaged. From this figure, we can see that both types of map errors decrease as the data size increases. Among the four cases, φ2 with TIC achieved the best results. However, when the distance function φ1 was used, the difference with and without TIC was not significant. March 27, 2005
DRAFT
IEEE TRANSACTIONS ON SMC, PART B, VOL. X, NO. X, NOVEMBER X 0.5
20 0.4
φ1 φ 1 + TIC φ2 φ 2 + TIC
0.4 0.3
φ1 φ 1 + TIC φ2 φ 2 + TIC
0.3
0.2
0.2
0.1
0.1 0 0
500
Numbers of Steps
1000
0 0
(a)
Fig. 9.
Distance Function and TIC
Triangle Orientation Error Errori
Delaunay Graph Error Errdel
Distance Function and TIC
500
Numbers of Steps
1000
(b)
(Sim 1) Changes in map Errors: comparison among two different distance functions with/without triangle inequality
constraint: (a) Delaunay graph error, (b) triangle orientation error
Figure 10 shows a real configuration of objects used in this experiment (a), and the maps constructed at 20, 100, and 300 steps (b)-(d) with the combination of φ2 and TIC. These figures show that the map becomes more accurate as the size of the observation data increases. C. (Sim 2) : Comparison Among Different Numbers of Objects and Area Sizes Next, we examined the influence of the environmental factors on the map construction performance by varying the number of objects and the area size. The other parameters related to the motion, observation, recognition and mapping were set to fixed values as shown in Table III. 1) (Sim 2-A) Numbers of Objects: First, we varied the number of objects from 5 to 30 with an interval of 5, whereas the area size was fixed to 1.5[m] × 1.5[m]. For each area size, we conducted 5 simulation runs with different object configurations, and averaged the results. Figure 11 shows the changes in the qualitative errors (Errdel ,Errori ) in these 6 environment settings. Although the final errors and the numbers of steps to converge tended to increase as the number of objects increased, the increases were not very large. Especially, the figure on the right shows that the number of objects made little difference in the triangle orientation errors Errori as long as it was between 10 and 30. Figure 12 illustrates one of the actual configurations, containing 30 objects, and a map constructed after 500 steps in that environment. 2) (Sim 2-B) Area Sizes: Second, we varied the side length of the area from 1.0[m] to 3.0[m] with an interval of 0.5[m], but with the number of objects fixed to 15. Figure 13 shows the changes in map
March 27, 2005
DRAFT
IEEE TRANSACTIONS ON SMC, PART B, VOL. X, NO. X, NOVEMBER X
0.75
21
0.75
ob6
ob6
ob2
ob1 ob8
ob4
ob0 0
ob2
ob1
0
ob9 ob5
ob4 ob0 ob8
ob9 ob7 ob5
ob7
ob3
ob3 −0.75 −0.75
0
0.75
−0.75 −0.75
0
(a)
0.75
(b)
0.75
0.75 ob6 ob2
ob1
ob6
ob1
ob2
ob8
ob8
ob4 0
ob4 ob0
0
ob0 ob5
ob9
ob5
ob9
ob7 −0.75 −0.75
ob7 0
ob3
0.75
−0.75 −0.75
0
(c)
Fig. 10.
ob3
0.75
(d)
(Sim 1) An example of landmark configuration and constructed maps : (a) real configuration, constructed maps after
(b) 20 steps, (c) 100 steps, (d) 300 steps
Delaunay Graph Error Errdel
Num. Obj.: 5 10 15 20 25 30
0.3
0.2
0.1
Num. Obj.: 5 10 15 20 25 30
0.3
0.2
0.1
0 0
500
Numbers of Steps
(a)
Fig. 11.
Triangle Orientation Error Errori
0.4
0.4
1000
0 0
500
Numbers of Steps
1000
(b)
(Sim 2-A) Changes in Map Errors: Comparison Among Various Numbers of Objects: (a) Delaunay graph error, (b)
triangle orientation error
March 27, 2005
DRAFT
IEEE TRANSACTIONS ON SMC, PART B, VOL. X, NO. X, NOVEMBER X 0.75 ob24
ob12 ob17ob20 ob14
ob19
ob16 ob15
ob4 ob21
ob26
ob9 ob1 ob28
0 ob23 ob2
ob3
ob3
0
0.75
ob20 ob21
ob26
ob9 ob1
ob27 ob23
ob28 ob2
ob6 ob0 ob8
−0.75 −0.75
(a)
Fig. 12.
ob25 ob15
ob5 ob7
ob14
ob29
−0.75 −0.75
ob16 ob4
ob17
ob0 ob10
ob8
ob18
ob12
ob11
ob13
ob24
ob25
ob27
ob5 ob7 ob6
ob22
ob11
ob13
ob18
0
0.75
ob22
ob19
22
ob29
ob10 0
0.75
(b)
(Sim 2-A) (a) An example of object configuration with 30 landmarks and (b) a constructed map after 500 steps
(Errdel = 0.083, Errori = 0.024)
0.4
0.4
0.2
0.2
0 0
500
Numbers of Steps
1000
0 0
(a)
Fig. 13.
Aria Size ([m]x[m]): 1.0 x 1.0 1.5 x 1.5 2.0 x 2.0 2.5 x 2.5 3.0 x 3.0
Triangle Orientation Error Errori
Delaunay Graph Error Errdel
Aria Size ([m]x[m]): 1.0 x 1.0 1.5 x 1.5 2.0 x 2.0 2.5 x 2.5 3.0 x 3.0
0.6
500
Numbers of Steps
1000
(b)
(Sim 2-B) Changes in map errors: comparison among various environment area sizes: (a) Delaunay graph error, (b)
triangle orientation error
errors for the 5 different area sizes. As in the previous case (Sim 2-a), we performed 5 runs for each area size changing the object layout and averaged the results. From the figures, it can be seen that the map errors and the number of steps to converge tend to increase as the environment size increases. We consider this to be mainly because (1) the chance of observing two objects together decreases as the layout becomes sparser, and (2) the exploration takes more time as the environment becomes larger. Interestingly, there was an exception to this trend, i.e., the map errors in the 1.0[m] × 1.0[m] environment were larger than those in the 1.5[m] × 1.5[m] environment, although the former environment was one size smaller than the latter. We considered this to be because the robot exploration was severely obstructed in the former environment, which is overcrowded with objects.
March 27, 2005
DRAFT
IEEE TRANSACTIONS ON SMC, PART B, VOL. X, NO. X, NOVEMBER X 0.4
0.4 0.3 0.2 0.1
500
Numbers of Steps
Errori
Move Strategy Param. θr [deg] l r [mm] 100 0 100 45 100 90
Triangle Orientation Error
Delaunay Graph Error Errdel
0.5
0 0
23
1000
(a)
Move Strategy Param. θr [deg] l r [mm] 100 0 100 45 100 90
0.3
0.2
0.1
0 0
500
Numbers of Steps
1000
(b)
Fig. 14. (Sim 3-A) Changes in map errors: comparison among various move decision strategies (range of direction of movement θr ): (a) Delaunay graph error, (b) triangle orientation error
D. (Sim 3) Comparison Among Different Motion Strategies In this simulation, we examined how the robot motion strategies have an influence on the map construction by varying the values of two motion-related parameters θr and lr , while the other parameter settings were fixed (See Table III). 1) (Sim 3-A) Motion strategy 1: Randomness of Direction of Movement: First, we compared three strategies with different values for the range of direction changes θr = 0, 45, 90[deg]. A larger value of θr means that the robot motion has more randomness in the choice of direction at each step. The moving
distance for one step lr was fixed to 100[mm] in all cases. Figure 14 shows how the qualitative map errors changed as the number of steps increased in these cases. These results indicated that the case of θr = 45 [deg] gave the best performance in terms of final map accuracy and learning speed. On the other hand, the case of θr = 0 [deg] gave the poorest long-term performance, although the map errors in the initial phase (approx. 50-100 steps) were smaller than in the other two cases. In contrast, although the map errors in the case of θr = 90 [deg] were as small as that in the case of θr = 45 [deg] in the long term, the learning speed (map accuracy improvement) was markedly slower in the early phases. This result can be explained by considering the differences in their patterns of environment exploration. Figure 15 shows examples of motion traces for 100 steps (lines) and observation positions for 500 steps (white circles) in each case. In the case of θr = 0 [deg] (left), as the robot regularly moves along the outer walls, the observation positions are concentrated along the edges of the field, and a large part of
March 27, 2005
DRAFT
IEEE TRANSACTIONS ON SMC, PART B, VOL. X, NO. X, NOVEMBER X
24
0.75
0.75
0.75
0
0
0
−0.75 −0.75
0
0.75
−0.75 −0.75
(a)
0
0.75
(b)
−0.75 −0.75
0
0.75
(c)
Fig. 15. (Sim 3-A) Traces of robot for 100 steps (solid lines) and distributions for observation points for 500 steps (white circles) with different move decision strategies: (a) θr = 0[deg], lr = 100[mm], (b) θr = 45[deg], lr = 100[mm], (c) θr = 90[deg], lr = 100[mm]
environment remains unexplored even after 500 steps. This results in the poor performance. In the case of θr = 90 [deg] (right), the distribution of the observation positions becomes asymptotically uniform over the whole area after a sufficient number of steps. However, in the early phase, the exploration seems inefficient because the robot occasionally becomes stuck in a local area due to the highly random nature of its motion. On the other hand, in the case of θr = 45 [deg] (middle), the exploration pattern is well balanced in both early and later phases as compared to the other two cases. The better performance is considered to be due to this exploration efficiency. 2) (Sim 3-B) Motion strategy 2: Distance of One Step: Next, we compared 4 strategies with different values for the distance of one step move lr = 50, 100, 200, 300[mm]. This time, θr was fixed as 45 [deg]. Figure 16 shows the changes in map errors in each case. The strategies with longer step lengths (i.e., lr = 200, 300[mm]) were more advantageous in the early phase because a larger area of the environment was explored. However, the differences among them were gradually reduced as the number of steps increased, because the whole area was covered eventually by all of the strategies. We consider the results and analysis in this section regarding the robot motion strategy are qualitatively reasonable and has some generality, although the optimum values of θr and lr might well depend on the environmental factors such as area size and number of object.
March 27, 2005
DRAFT
IEEE TRANSACTIONS ON SMC, PART B, VOL. X, NO. X, NOVEMBER X
0.3
0.2
0.1
0 0
500
Numbers of Steps
1000
Triangle Orientation Error
Move Strategy Param. θr [deg] l r [mm] 45 50 45 100 45 200 45 300
Errori
0.4
Delaunay Graph Error Errdel
0.4
25
Move Strategy Param. θr [deg] l r [mm] 45 50 45 100 45 200 45 300
0.3
0.2
0.1
0 0
(a)
500
Numbers of Steps
1000
(b)
Fig. 16. (Sim 3-B) Changes in map Errors: comparison among various move decision strategies (moving distance for one step lr ) : (a) Delaunay graph error, (b) triangle orientation error
E. (Sim 4) : Relationship Between Observation Data and Map Quality The above results imply that the quality of a map is mainly determined by the distribution of observation locations. Then, we examined the relationship between the characteristics of observation data and the resultant map errors in the following manner. First, we prepared an environment which has 1.5[m]×1.5[m] area size and contains 10 objects. Then, we performed 50 simulation runs in this environment with initial robot positions chosen at random. Each run consisted of 100 steps. The other parameter settings are shown in Table III. 1) Map Quality and Explored Area: First, we examined the relationship between the map quality and how much of the entire area of the environment is explored by the robot. In this analysis, we used mean nearest neighbor distance, W , as a measure to evaluate the spatial distribution of the observation points, which is calculated as:
m 1 X W = (min disti,j ) m i j6=i
(11)
where, m denotes the number of points (100 in this case), and disti,j denotes the distance between the i-th and j -th points. In general, a point distribution pattern with a smaller W value indicates a clustered
or concentrated distribution, and a pattern with a larger W indicates a uniform or scattered distribution. Figure 17 shows the relationships between the mean nearest neighbor distance W (X-axis) and the qualitative map errors Errdel , Errori (Y-axis) in this simulation. Least-square fitting lines and correlation coefficients are also shown in these figures. There was a general trend for the map accuracies to increase with scattering of the observation points in the exploration path over a larger area. March 27, 2005
DRAFT
Errori
0.4 y = −23.1 x + 1.24
Triangle Orientation Error
Delaunay Graph Error
Errdel
IEEE TRANSACTIONS ON SMC, PART B, VOL. X, NO. X, NOVEMBER X
0.2
Correlation: −0.533
0
0.04
0.05
26
0.4 0.3 y = −15.3 x + 0.834
0.2 0.1
Correlation: −0.567
0
0.04
(a)
Fig. 17.
0.05
Mean Nearest Neighbour Distance W [m]
Mean Nearest Neighbour Distance W [m]
(b)
(Sim 4) Relation between mean nearest neighbour distance and map errors: (a) Delaunay graph error, (b) triangle
Errdel
1.0 y=−1.28 x + 3.01
Delaunay Graph Error
Delaunay Graph Error
Errdel
orientation error
Correlation: −0.720
0.0
0.4
y=−0.613 x + 1.44 0.2
Correlation: −0.777
0
1.5
2
1.5
Entropy of Landmark Observation Frequency Ent
(a)
Fig. 18.
2
Entropy of Landmark Observation Frequency Ent
(b)
(Sim 4) Relation between entropy of object observation frequency and map errors: (a) Delaunay graph error, (b)
triangle orientation error
2) Map Quality and Equality of Object Observation Frequencies: Next, we focused on the relationship between the distribution of object observation frequencies and the map errors. We defined the entropy of the distribution of object observation frequencies as: Ent = −
X ni i
n
· log
ni n
(12)
where, ni denotes the number of times the i-th object was observed, and n denotes the sum of n1 , · · · , nN (n =
P
i ni ).
Figure 18 shows the relationships between the the entropy of the object observation
frequencies and the qualitative map errors (Errdel and Errori ). As shown in these figures, there are strong negative correlations between them, which means that an exploration path that observes all of the March 27, 2005
DRAFT
IEEE TRANSACTIONS ON SMC, PART B, VOL. X, NO. X, NOVEMBER X
27
objects with equal frequency is desirable for this map building method. F. (Sim 5) Influence of Object Recognition Errors Previous simulation results have been based on the assumption that there are no errors in the object recognition process. In a real environment, however, occasional failures of object recognition are almost inevitable due to various types of uncertainty. Therefore, in this experiment, we examined how the recognition errors affect the performance of the proposed map learning method. More specifically, we considered two types of recognition error – non-recognition and misrecognition. These errors were generated artificially by applying the following operations to the original observation data at each position (a list of visible objects Λo ): •
Non-recognition error: A specified percentage of elements in Λo is chosen at random and removed.
•
Misrecognition error: A specified percentage of elements in Λo is chosen at random and replaced with other object Ids, which are also chosen randomly.
Misrecognition error is expected to be more disturbing than non-recognition error, which makes it more difficult for a robot to build an accurate map. In this experiment, we set the number of objects and size of environment area to 15 and 1.5[m]×1.5[m], respectively. The other parameters were also set to fixed values as shown in Table III. 1) (Sim 5-A) Influence of Non-recognition Error: Figure 19 shows how the map errors change as the number of steps increases, when the non-recognition rate is set to 0.0 (i.e., no error), 3.0, 10 and 20[%]. In each case, we performed 25 simulation runs (5 runs for each of 5 layout patterns), and then averaged the results. In the early exploration phase, the map errors Errdel and Errori clearly increased as the nonrecognition rate grew. However, the difference between them became smaller and was almost negligible as the number of steps increased. For example, the average difference between Errori with 20% nonrecognition error and that with no recognition error after 1000 steps was only 1.80%. Therefore, the proposed method was very robust against this type of object recognition error. 2) (Sim 5-B) Influence of Misrecognition Error: Figure 20 shows the results when the misrecognition rate was set to 0.0 (no error), 0.5, 2.0 and 10[%]. As expected, the influence of this type of recognition error was greater than that of non-recognition error. For example, the average difference of Errori between non-error case and 10% misrecognition case remained at 3.40% even after 1000 steps. However, when the misrecognition error was 2.0%,
March 27, 2005
DRAFT
IEEE TRANSACTIONS ON SMC, PART B, VOL. X, NO. X, NOVEMBER X
28 0.2
0.2
Non−recognition Error Rate [%] 0.0 3.0 10 20
Triangle Orientation Error Errori
Delaunay Graph Error Errdel
Non−recognition Error Rate [%] 0.0 3.0 10 20
0.1
0.1
0 0
500
Numbers of Steps
0 0
1000
(a)
Fig. 19.
500
Numbers of Steps
1000
(b)
(Sim 5-A) Changes in map errors: comparison among various non-recognition rates : (a) Delaunay graph error, (b)
triangle orientation error
0.2 0.3
0.2
Mis−recognition Error Rate [%] 0.0 0.5 2.0 10
Triangle Orientation Error Errori
Delaunay Graph Error Errtop
Mis−recognition Error Rate [%] 0.0 0.5 2.0 10
0.1
0.1
0 0
500
Numbers of Steps
1000
0 0
(a)
Fig. 20.
500
Numbers of Steps
1000
(b)
(Sim 5-B) Change of map errors: comparison of various misrecognition rates : (a) Delaunay graph error, (b) triangle
orientation error
Errori increased by only 0.28% after 1000 steps. Therefore, we concluded that the proposed method is
considerably robust against this type of recognition error. G. (Sim 6) Comparison Among Different Viewangles of Camera Until now, we have assumed that the robot always has a 360-degree angle of view. In this experiment, we examined how the map building performance is affected when only a smaller angle of view is available. Specifically, we set the angle of view of the camera to 90, 120, 180 and 360[deg], and then compared their map errors. Detail of the other parameter settings is shown in Table III. In each case, we conducted March 27, 2005
DRAFT
IEEE TRANSACTIONS ON SMC, PART B, VOL. X, NO. X, NOVEMBER X 0.35 View Angle of Camera [deg] 0.3
360 (Panorama) 180 120 90
0.25 0.2
0.15
Triangle Orientation Error Errori
Triangle Orientation Error Errori
0.35
29
View Angle of Camera [deg] 0.3
360 (Panorama) 180 120 90
0.25 0.2
0.15
0.1
0.05
0.1
0.05
0 0
500
Numbers of Steps
1000
0 0
(a)
Fig. 21.
5000
10000
Amount of Co−visibility Information
15000
(b)
(Sim 6) Change of map errors: comparison of various viewangles of camera. (a) with numbers of steps. (b) with
amounts of object co-visibility information
5 simulation runs for different object layout patterns, and averaged the results. Figure 21 (left) shows how the average triangle orientation error (Errori ) changes as the number of steps increases for each case. From this figure, it can been seen that a robot with a larger angle of view is able to construct a more accurate map than a robot with a smaller angle of view, when they take the number of steps. However, we should not jump to the conclusion that a decrease in the angle of view necessarily causes a performance deterioration, because the difference in the angle of view also means the difference in the number of visible objects or the amount of information. That is to say, the above result may be due to the difference in the amount of information between the cases, rather than to the difference in the angle of view itself. In fact, the amount of object co-visibility information is considered to be approximately proportional to the square of the angle of view, as the expected number of visible objects at a step is proportional to the angle of view, and an observation of m visible objects gives
m C2
=
m(m−1) 2
object
pairs. Based on this idea, we re-plotted the qualitative map error in each case against the amount of object co-visibility information, instead of the number of steps (Figure 21 (right)). From this figure, we can see that all the cases lie on a curve, which means that the accuracy of a constructed map is determined by the amount of object co-visibility information regardless of the angle of view of the camera. H. (Sim 7) Comparison Among Different Object Recognition Models In the previous experiments except Sim 5 in which recognition errors were considered, we assumed that each object is recognizable if and only if its horizontal visual angle is larger than the threshold March 27, 2005
DRAFT
IEEE TRANSACTIONS ON SMC, PART B, VOL. X, NO. X, NOVEMBER X
30
θvis = 5.0 [deg]. This simplistic model of object recognition raises three questions below:
(1)
To what extent is the performance of the proposed method dependent on the specific value 5.0[deg] of the threshold parameter θvis ?
(2)
What if each object has a different threshold value for recognition, rather than an identical threshold value ?
(3)
What if the object recognition rate or probability is a smooth increasing function of object’s visual angle, rather than a step function.
In this section, we present the results of experiments performed to answer these questions. The parameter settings in those experiments are summarized in Table III. 1) (Sim 7-A) Recognition Model 1: Deterministic Model with Identical Recognition Threshold Angle: First, we examined how the value of recognition threshold θvis influences the map building results by varying it between 4.0[deg] and 8.0[deg]. A small threshold value means that objects are easily recognized by the robot even if they are distant and partially occluded by other objects, while a large threshold means that only near and non-occluded objects are recognizable. It should be noted that we still assume that all objects have an identical threshold for recognition in this experiment. Figure 22 shows how the triangle orientation error (Errori ) changes as the number of steps increases for each recognition threshold value. As can be seen from this figure, smaller threshold values give better results when the number of steps is very small. This is thought to be because sufficient data are collected more easily with smaller thresholds, as more objects can be recognized at a time. As the number of steps grows, however, the difference among the cases becomes very small, and the final error is smallest when the threshold is set to 6.0[deg]. This result implies that amount of valid information for map building decreases not only when there are too few visible objects, but also when there are too many visible objects. 2) (Sim 7-B) Recognition Model 2: Deterministic Model with Arbitrary Recognition Threshold Angles: The above assumption that all objects have an identical minimum visual angle θvis to be recognized may be too restrictive. In the real world, it is more natural to assume that the degree of recognizability varies from object to object, as each object has its own property such as shape, colour and so on. Therefore, we next examined how the performance of the method changes when each object has an arbitrary minimum recognition visual angle θvis,i within a specified range. Specifically, we compared the following four patterns: (1) All objects have an identical recognition threshold θvis = 5.0[deg], Each object has an arbitrary threshold θvis,i within the range of (2) 5.0 – 7.0 [deg], (3) 4.5 – 7.5 [deg], and (4) 4.0 – 8.0 [deg]. Table IV shows the recognition threshold (minimum March 27, 2005
DRAFT
IEEE TRANSACTIONS ON SMC, PART B, VOL. X, NO. X, NOVEMBER X
31
Triangle Orientation Error Errori
0.25 Recognition Threshold Angle [deg]
0.2
4.0 5.0 6.0 7.0 8.0
0.15 0.1
0.05 0 0
Fig. 22.
500
1000
Numbers of Steps
(Sim 7-A) Change of map errors: deterministic object recognition model with identical threshold angle
TABLE IV (S IM 7-B) L IST OF MINIMUM VISUAL RECOGNITION ANGLES (θvis,i [ DEG ]) FOR 30 OBJECTS IN CASE (4). VALUES ARE DECIDED RANDOMLY IN THE RANGE OF
Obj.ID 0 1 2 3 4 5 6 7 8 9
θvis,i [deg] 7.62 7.32 4.49 7.29 4.45 5.16 4.17 7.01 5.44 7.70
Obj.ID 10 11 12 13 14 15 16 17 18 19
θvis,i 4.64 4.62 5.32 4.20 7.90 6.93 5.02 6.38 6.46 4.00
[4.0, 8.0]
Obj.ID 20 21 22 23 24 25 26 27 28 29
θvis,i 4.47 5.24 4.32 4.88 7.79 6.93 5.71 6.34 6.74 6.67
visual angle) of each object in the experiment pattern (4) above. These values were randomly chosen within the range. It should be noted that difference in the degree of recognizability causes an imbalance in the number of times that each object is observed. Figure 23 shows the change of triangle orientation error Errori with the number of steps in each case. Although this indicates that a wider range of recognition threshold values results in poorer performance, the deterioration is not severe. 3) (Sim 7-C) Recognition Model 3: Probabilistic Object Recognition Model: So far, we have assumed a deterministic object visibility model in which an object is visible (recognizable) if its visual angle is larger than a threshold θvis and invisible (unrecognizable) otherwise. In other words, it has been assumed
March 27, 2005
DRAFT
IEEE TRANSACTIONS ON SMC, PART B, VOL. X, NO. X, NOVEMBER X
32
Triangle Orientation Error Errori
0.25 Range of Threshold [deg] 5.0 (Identical) 5.0 − 7.0 4.5 − 7.5 4.0 − 8.0
0.2
0.15 0.1
0.05 0 0
Fig. 23.
500
Numbers of Steps
1000
(Sim 7-B) Change of map errors: deterministic object recognition model with arbitrary threshold angles
that the probability that an object i is visible Pobi ,vis can be represented by a step function as: Pobi ,vis (vhi ) =
0 (vhi < θvis )
1 (vhi ≥ θvis )
(13)
where vhi stands for the horizontal visual angle of the object.
In the real world, however, it is more realistic to consider that Pobi ,vis increases continuously with vhi . In this experiment, we consider a probabilistic object recognition model in which Pobi ,vis is represented by a logistic function as follows: Pobi ,vis (vhi ) =
1 1 + e−K·(vhi −θvis )
(14)
where K is a parameter characterizing steepness of the function nearby the threshold θvis . Note that this model asymptotically approaches the conventional deterministic model (Equation 13) when K approaches infinity. For a small K value, on the other hand, the probability that an object is recognized increases very slowly with vhi . We examined how the accuracy of constructed maps is influenced by the value of steepness parameter K . The other parameter θvis was fixed at 5.0 degrees. Figure 24 shows the shape of Pobi ,vis (vhi ) as a
function of vhi for each case of K = 50, 100, 200, ∞. Note that vhh is presented in radian here. Figure 25 shows the change of triangle orientation error Errori with the number of steps in each case. From this figure, we can see that there is no significant difference in resultant performance between the deterministic and probabilistic recognition models as long as K is not too small.
March 27, 2005
DRAFT
IEEE TRANSACTIONS ON SMC, PART B, VOL. X, NO. X, NOVEMBER X
33
Probability of Recognition
1
0.5 Deterministic K = 200 K = 100 K = 50
0 0
0.1
0.2
Horizontal Visual Angle [rad]
Fig. 24.
Probabilistic object recognition models used in Sim 7-C. Threshold value θ is set to 5[deg] (0.0873[rad])
Triangle Orientation Error Errori
0.25
Steepness Parameter K + inf 200 100 50
0.2
0.15 0.1
0.05 0 0
Fig. 25.
500
Numbers of Steps
1000
(Sim 7-C) Change of map errors: probabilistic object recognition model
VI. E XPERIMENT WITH A R EAL ROBOT A. Overview To examine the applicability of the simulation results to real environments, we performed an experiment in an indoor environment using a real mobile robot (RWI B14r). This robot was equipped with 16 sonar sensors around the body, and had a color camera mounted on the top. The environment was a rectangle field of 3.5[m] × 4.0[m], containing 6 landmark objects. Each landmark was a cylindrical object (height, 720–840[mm]; diameter, 210–300[mm]) with a unique color (yellow, yellow-green, light blue, pink, green, or red), making the visual recognition very simple. Fig.26 is a snapshot of this experiment. The object positions in this environment are shown in Figure 27. In this experiment, we employed a simple visual object recognition method based on shape and color March 27, 2005
DRAFT
IEEE TRANSACTIONS ON SMC, PART B, VOL. X, NO. X, NOVEMBER X
Fig. 26.
34
Experiment environment
ob5 ob1
ob0
ob2
ob4 ob3
Fig. 27.
Configuration of 6 objects in the experiment environment
matching. Briefly, raw images were first divided into a finite set of regions with consistent colors after some preprocessing, such as edge detection and smoothing. Then, the regions occupying more than 10% of the area in the original image and that had two vertical edges were regarded as landmark objects. Finally, each of the detected landmarks was identified by its color information. An offline test indicated that the actual object recognition rate in this environment was about 90%. The motion and observation of the robot were based on the procedure described in section IV-A. The length of each step was set to 0.75[m], and the robot changed its direction at random within the range of ±137.5[ ◦ ] for every two steps. When it approached an obstacle within a distance of 0.3[m], it also changed the direction of movement to avoid collision. At each observation location, the robot turned around through 360
◦
taking several images, and joined them into one panoramic image.
We conducted 3 experimental trials, each of which consisted of 100 steps.
March 27, 2005
DRAFT
IEEE TRANSACTIONS ON SMC, PART B, VOL. X, NO. X, NOVEMBER X
35
φ1 φ 1 + TIC φ2 φ 2 + TIC
Delaunay Graph Error
0.1
Distance Function
Triangle Orientation Error Errori
Errdel
0.2 Distance Function
φ1 φ 1 + TIC φ2 φ 2 + TIC
0.1
0.05
0 0
50
Number of Steps
100
0 0
(a)
Fig. 28.
50
Number of Steps
100
(b)
(Exp) Changes in map errors: comparison among different distance functions with / without triangle inequality
constraint:(a) Delaunay graph error, (b) triangle orientation error
B. Various Combinations of Distance Functions and Triangular Inequality Constraints As in section V-B, we compared the 4 different distance estimation schemes (φ1 or φ2 , with or without TIC) in the experimental environment. Figure 28 shows the average results of the three trials. These results indicate that the distance function φ2 is slightly better than φ1 and that the triangular inequality constraint (TIC) improves the performance in either case. These results were consistent with those of the simulation discussed in section V-B. Figure 29 shows maps built using φ2 with TIC after 10 and 100 steps in the first run. Surprisingly, a map with zero Delaunay error (Errdel ) was obtained after only 10 steps in this run. In addition, the triangle orientation error (Errori ) also became zero after 100 steps. C. Influence of Recognition Errors and Non-uniformity of Landmark Observations Although Errdel was 0.0 after only 10 steps in the first trial as described above, in the second and third trials the errors were 0.42 and 0.33 after 10 steps, respectively, and the error became zero after 30 steps in both cases. Figure 30 shows the maps built after 10 steps in the second and third trials. There are two possible explanations for the poor results in the early phases of the second and third trials: non-uniformity of the observed landmarks in the data, and the misrecognition or non-recognition of landmark objects. Table V shows the observation data (history of observed objects) of the first 10 steps in the 1st and 3rd trials. In the 3rd trial, object 3 appeared only once, whereas object 2 appeared 9 times, implying that the exploration was concentrated in the upper-right area of the environment. In March 27, 2005
DRAFT
IEEE TRANSACTIONS ON SMC, PART B, VOL. X, NO. X, NOVEMBER X
ob1
36
ob5
ob1
ob5
ob0 ob2
ob2
ob0 ob4
ob4 ob3
ob3
(a)
(b)
Fig. 29. (Exp) Constructed map after (a) 10 (Errdel = 0.00, Errori = 0.00) and (b) 100 steps (Errdel = 0.00, Errori = 0.10) in Trial 1
ob5 ob0
ob1
ob5 ob2
ob0
ob3
ob2
ob1
ob4
ob4
ob3
(a)
(b)
Fig. 30. (Exp) Constructed map after 10 steps in (a) Trial 2 (Errdel = 0.42, Errori = 0.10) and (b) Trial 3 (Errdel = 0.33, Errori = 0.35)
addition, the object IDs marked with circles in this figure were found to have been misrecognized. This result indicates that the non-uniformity of object observation and misrecognition error markedly degrade the quality of built maps in the early exploration phase. VII. D ISCUSSION In this section, we discuss the appropriateness of the assumptions underlying the proposed method, and several issues that remain to be resolved for the development of practical applications.
March 27, 2005
DRAFT
IEEE TRANSACTIONS ON SMC, PART B, VOL. X, NO. X, NOVEMBER X
37
TABLE V (E XP ) L ISTS OF OBSERVED OBJECTS IN ( A ) Trial 1 AND ( B ) Trial 3. M ARKED OBJECTS WERE CONSIDERED TO HAVE BEEN MISRECOGNIZED .
(a) Step 1 2 3 4 5 6 7 8 9 10
Observed Objects 0, 1, 2 0, 1, 2 1, 2, 5 2, 5 2, 4, 5 2, 4 3, 4 2, 4 2, 3, 4 2, 3
(b) Step 1 2 3 4 5 6 7 8 9 10
Observed Objects 2, 4, 5 2, 4, 5
, 0 2, 4 , 5 0, 1 0, 1, 2 0, 1, 2 0, 2, 4 0, 1, 2
, 1 2, 3, 4 1, 2
A. Object Recognition The proposed map building method is dependent on the assumption that the probability of recognizing an object decreases as its distance from the camera increases. As discussed in section III-D, we consider this assumption to be mostly valid, because if the distance between the camera and object is increased, the information required for object recognition and identification decreases due to the image size reduction and high probability of occlusion. That is, both non-structured and structured noises in the image are expected to increase with distance. In the field of computer vision and pattern recognition, there have been many studies of how object recognition performance is affected by such image distortion. For example, in the model-based object recognition methods, both theoretical and experimental performance analyses have been performed by many groups [41], [42], [43], [44], [45]. Another example is experimental analysis in appearance-based recognition methods [46]. All of these studies indicated that the probability of correct recognition is reduced by the distortion of object images. Based on these results, we conducted most of the experiments with a simplified model of the object recognition process, in which a landmark object was assumed to be “visible” if its image size was larger than a threshold value. To compensate for the simplicity of our system, we performed additional simulations considering recognition errors (section V-F) and a little more complex recognition models (section V-H), and examined how the resultant map quality is affected. However, the actual process of visual object recognition is more complicated, and the probability of correct recognition is dependent not only on the distance between the camera and the object, but also on various conditions, such as relative poses of the objects to the camera, illumination, and the shapes of March 27, 2005
DRAFT
IEEE TRANSACTIONS ON SMC, PART B, VOL. X, NO. X, NOVEMBER X
38
the objects themselves. Therefore, it is necessary to examine the validity of the proposed method under more realistic conditions in future studies. Another issue regarding object recognition in the method is the assumption that each object can be uniquely identified by its appearance. In a real environment, however, a robot often suffers from the aliasing problem in which two or more objects cannot be distinguished by instantaneous observations. This problem will be solved to some extent by applying hidden state estimation methods such as HMM to historical information on the observation and location. B. Heuristics of Co-visibility and Spatial Distance The central concept of our map building method is the co-visibility heuristics that if two objects are seen together frequently, then they are likely to be located close to each other. While section III-E discussed the validity of the heuristics in a simplified environment, we will discuss its significance here from other viewpoints. First, as co-visibility of objects can be regarded as the co-occurrence of the event that objects are visible, there have been many previous studies on learning of behavior and models based on co-occurrence information. The most basic principle related to this issue is Hebb’s rule, which postulates that if two events cooccur often, the connection between them should be strengthened. Many recent reinforcement learning methods, such as Q-learning [47], are also based on the co-occurrence or temporal proximity between behavior and reward events. Interestingly, there is another similarity between reinforcement learning and our CoviMap. That is, in the former correct behavior rules are learned from discrete events of rewards, while in the latter a qualitatively accurate map is learned from discrete events of co-visibility among objects. In the field of information retrieval, co-occurrence information of words [48] and documents [49] has been utilized to estimate their similarity and distance. The entities of words and documents in these studies correspond to objects in our system. In some of these studies, MDS and its variants are also used to make a “map”of documents and words based on co-occurrence frequencies, to visualize the relationships between them in an intuitive manner. This implies that people take it for granted that there is a strong correlation between the co-occurrence and semantic distance of objects. The above heuristics can also be regarded as a special case of a more general one – “temporal and spatial proximities are approximately equivalent in many environments.” In research concerning human spatial memory, it has been reported that humans often rely on temporal rather than spatial proximity to March 27, 2005
DRAFT
IEEE TRANSACTIONS ON SMC, PART B, VOL. X, NO. X, NOVEMBER X
39
memorize spatial structures of environments [50]. In this paper, we focused on the co-visibility information and did not use other sensor information for map building. However, actual sensors of robots provide a variety of information such as range information. One can utilize that kind of sensor information to estimate the spatial distances among objects either instead of or in combination with the co-visibility information. C. Map Building by Multi-dimensional Scaling As mentioned previously, multi-dimensional scaling (MDS) is a well-known technique to construct a layout model of objects when the dissimilarities among them are given, and has been applied successfully to cognitive map research as a means of analyzing the spatial knowledge of human subjects [51]. One contribution of the present study is to demonstrate that MDS can also be applied successfully to an autonomous map building task by a mobile robot. Courrieu [52] applied their data embedding algorithm based on MDS to a robot mapping / navigation task as an application example. However, this work is fundamentally different from that in the present study in that they assumed an extremely simple environment, and that the distance matrix is computed by means of a standard triangulation technique with quantitative measurements of the angular distances between objects. While we used the classical scaling method [37] in combination with empirical distance functions to generate 2D coordinates of the objects from their co-visibility frequencies, it has certain limitations as follows: •
It is required that both the function type and parameter values of the empirical distance function are determined beforehand.
•
It is assumed that all elements in a distance matrix have the same weight, which means all distance estimates among the objects are treated equally, regardless of the differences in reliability of each estimate.
•
Computation of eigenvalues and eigenvectors tends to be unstable when sufficient data is not available.
Fortunately, it is possible to overcome these problems by using more sophisticated MDS methods, such as SMACOF and ALSCAL [35], [36], together with the following enhancements: 1) Automatic parameter selection: Update of 2D coordinates of objects by MDS and optimization of empirical distance function parameters by least square fitting are repeated alternately until convergence. In addition, it is possible to use non-metric MDS to obtain the map directly from the ordinal information of co-visibility frequencies. March 27, 2005
DRAFT
IEEE TRANSACTIONS ON SMC, PART B, VOL. X, NO. X, NOVEMBER X
40
2) Weighting of distance matrix: Weight on each element of the distance matrix is determined in proportion to the reliability of the estimate. That is, higher weights are assigned to pairs of objects that are observed together more often. Some of these improvements have been implemented in the latest version of CoviMap [53], [54]. Furthermore, it is possible to use methods other than MDS to reproduce the 2D coordinates of objects from the distance matrix. For example, the FastMap algorithm [55] proposed by Faloutsos has the advantage of being linear to the number of objects, whereas MDS is quadratic. The spring embedder model [56], [57], which has often been used for data visualization, can also be applied either instead of or in combination with MDS for this purpose. Especially, by combining MDS and spring embedder method, we may derive an online algorithm for map building, while the present method is basically a batch process. D. Map Building in A Large Environment As implied by the simulation results in section V-C, it is necessary to consider two practical problems when the environment becomes much larger and contains more landmark objects. First, if the environment becomes larger, the travel distance required to explore the whole area increases, which leads to increases in both time and cost to build the map, because the current implementation of the method is basically a “batch” process. Another issue is that the computational complexity of map building by MDS is about O(N 3 ), where N is the number of objects or dimensions of the distance matrix.
A reasonable approach to these problems is to divide the whole area of the environment into a set of smaller regions, and construct a local map for each of them, and then complete a global map by integrating them. Many similar approaches have already been proposed for metric map building tasks [1], [30], [58]. By applying this technique to our method, it is expected that the map building process will become incremental and the computational complexity will be significantly reduced. E. Exploration Strategy In this study, we assumed that a robot explores the environment using a very simple motion control algorithm based on the combination of random walking and collision avoidance. Meanwhile, the simulation results shown in section V-E indicated that the quality of maps built by this method is highly correlated with the size of the explored area and variations in observation data. This means that a more systematic exploration strategy is needed for efficient map construction, i.e., a more accurate map with less time and travel distance. March 27, 2005
DRAFT
IEEE TRANSACTIONS ON SMC, PART B, VOL. X, NO. X, NOVEMBER X
41
There have been a number of studies regarding this environment exploration problem [59], [60], [61]. The basic idea is to select a motion that gains as much new information about the environment as possible at each step. In our case, a similar exploration strategy that explores the area containing less frequently observed landmarks with high priority can be considered. Another issue regarding the exploration is when to stop exploring the environment. As the results in section V show, the number of steps required for the map errors to converge heavily depends on environmental factors and robot’s ability. Although it is ideal to be able to stop when the accuracy of the map reaches a prespecified level, we have not found a theory that provides upper / lower bounds of the accuracy of a map under construction. A reasonable alternative would be to stop exploring when the change of the empirical distances among objects become smaller than a specified threshold. VIII. C ONCLUSION In this paper, we proposed a novel map building method for mobile robots based on co-visibility, which provides very coarse and qualitative information regarding “whether two objects are visible at the same time or not.” This method utilizes a simple heuristic rule, “a pair of objects observed at the same time more frequently are likely to be located more closely together,” and a common multivariate data analysis technique known as multi-dimensional scaling (MDS). As compared with conventional metric map building methods, this method has the noteworthy feature that it does not require quantitative and precise sensor measurements, nor the robot’s own position. Thus, it is possible to construct a reasonably accurate map even from a set of low quality information, as long as a sufficient amount of data are available. Due to these characteristics, our method is expected to be advantageous when there is insufficient a priori information regarding the environment and the robot itself. In the simulation study, we examined the characteristics of this method, changing the values of the parameters, such as object number, environment size, and degree of observation uncertainty. We also performed a basic experiment with a real robot in a simplified environment. In future studies, we intend to examine the issues discussed in the previous section, and make our system applicable to more complex environments. R EFERENCES [1] P. Hebert, S. Betge-Brezetz, and R.Chatila, “Decoupling odometry and exteroceptive perception in building a global world map of a mobile robot: The use of local maps,” in Proc. IEEE Int. Conf. Robotics Automat., 1996, pp. 757–764.
March 27, 2005
DRAFT
IEEE TRANSACTIONS ON SMC, PART B, VOL. X, NO. X, NOVEMBER X
42
[2] J. Uhlmann, S. Julier, and M. Csorba, “Nondivergent simultaneous map-building and localization using covariance intersection,” in Proc. of SPIE: Navigation and Control Technologies for Unmanned Systems II, 1997, pp. 2–11. [3] J. Castellanos, J. Montiel, J. Neira, and J. Tardos, “The spmap: A probabilistic framework for simultaneous localization and map building,” IEEE Trans. Robot. Automat., vol. 15, no. 5, pp. 948–953, 1999. [4] H. Feder, J. Leonard, and C. Smith, “Adaptive mobile robot navigation and mapping,” Int. J. Robotics Res., vol. 18, no. 7, pp. 650–668, 1999. [5] J. Leonard and H. Feder, “A computationally efficient method for large-scale concurrent mapping and localization,” in Proc. Ninth Int. Symp. Robotics Res., 2000, pp. 169–176. [6] S. Se, D. Lowe, and J. Little, “Vision-based mobile robot localization and mapping using scale-invariant features,” in Proc. IEEE Int. Conf. Robotics Automat., 2001, pp. 2051–2058. [7] D. Anguelov, R. Biswas, D. Koller, B. Limketkai, and S. Thrun, “Learning hierarchical object maps of non-stationary environments with mobile robots,” in Proc. Uncert. AI, 2002. [8] D. Kortenkamp, R. Bonasso, and R. Murphy, Eds., Artificial Intelligence and Mobile Robots : Case Studies of Successful Robot Systems.
MIT Press, 1998.
[9] G. Dudek and M. Jenkin, Computational Principles of Mobile Robotics. [10] R. Murphy, Introduction To AI Robotics.
Cambridge University Press, 2000.
MIT Press, 2000.
[11] P. Moravec and A. Elfes, “High resolution maps from wide angle sonar,” in Proc. IEEE Int. Conf. Robotics Automat., 1985, pp. 116–121. [12] M. Mataric, “Integration of representation into goal-driven behavior-based robots,” IEEE Trans. Robot. Automat., vol. 8, no. 3, pp. 304–312, 1992. [13] H. Shatkay and L. Kaelbling, “Learning topological maps with weak local odometric information,” in Proc. Int. Joint Conf. Artif. Intell., 1997, pp. 920–927. [14] J. Borenstein and Y. Koren, “Histogramic in-motion mapping for mobile robot obstacle avoidance,” IEEE Trans. Robot. Automat., vol. 7, no. 4, pp. 535–539, 1991. [15] T. Sogo, H. Ishiguro, and T. Ishida, “Acquisition and propagation of spatial constraints based on qualitative information,” IEEE Trans. Pattern Anal. Machine Intell., vol. 23, no. 3, pp. 268–278, 2001. [16] S. Engelson and D. McDermott, “Error correction in mobile robot map learning,” in Proc. IEEE Int. Conf. Robotics Automat., 1992, pp. 2555–2560. [17] M. Golfarelli, D. Maio, and S. Rizzi, “Elastic correction of dead-reckoning errors in map building,” in Proc. IEEE/RSJ Int. Conf. Intell. Robotic Syst., 1998, pp. 905–911. [18] T. Duckett, S. Marsland, and J. Shapiro, “Learning globally consistent maps by relaxation,” in Proc. IEEE Int. Conf. Robotics Automat., 2000, pp. 3841–3846. [19] S. Thrun, “Learning metric-topological maps maps for indoor mobile robot navigation,” Artif. Intell. J., vol. 99, no. 1, pp. 21–71, 1998. [20] B. Kuipers, “The spatial semantic hierarchy,” Artif. Intell. J., vol. 119, no. 1-2, pp. 191–233, 2000. [21] A. Elfes, “Sonar-based real-world mapping and navigation,” IEEE Trans. Robot. Automat., vol. 3, no. 3, pp. 249–265, 1987. [22] G. Oriolo, G. Ulivi, and M. Vendittelli, “Real-time map building and navigation for autonomous robots in unknown environments,” IEEE Trans. Syst., Man, Cybern., vol. 28, no. 3, pp. 316–333, 1998.
March 27, 2005
DRAFT
IEEE TRANSACTIONS ON SMC, PART B, VOL. X, NO. X, NOVEMBER X
43
[23] S. Thrun, D. Fox, and W. Burgard, “A probabilistic approach to concurrent mapping and localization for mobile robots,” Machine Learning, vol. 31, no. 1-3, pp. 29–53, 1998. [24] B. Kuipers and Y. Byun, “A robot exploration and mapping strategy based on a semantic hierarchy of spatial representations,” Robotics Autonomous Syst., vol. 8, pp. 47–63, 1991. [25] D. Kortenkamp and T. Weymouth, “Topological mapping for mobile robots using a combination of sonar and vision sensing,” in Proc. National Conf. Artif. Intell., 1994, pp. 979–984. [26] D. Pierce and B. Kuipers, “Learning to explore and build maps,” in Proc. National Conf. Artif. Intell., 1994, pp. 1264–1271. [27] S. Koenig and R. Simmons, “Unsupervised learning of probabilistic models for robot navigation,” in Proc. Int. Conf. Robotics Automat., 1996, pp. 2301–2308. [28] G. Beccari, S. Caselli, and F. Zanichelli, “Qualitative spatial representations from task-oriented perception and exploratory behaviors,” Robotics Autonomous Syst., vol. 25, pp. 147–157, 1998. [29] R. Biswas, B. Limketkai, S. Sanner, and S. Thrun, “Towards object mapping in dynamic environments with mobile robots,” in Proc. IEEE/RSJ Int. Conf. Intell. Robotic Syst., 2002. [30] F. Lu and E. Milios, “Globally consistent range scan alignment for environment mapping,” Autonomous Robots, vol. 4, pp. 333–349, 1997. [31] T. Levitt and D. Lawton, “Qualitative navigation for mobile robots,” Artif. Intell. J., vol. 44, no. 3, pp. 305–361, 1990. [32] C. Schlieder, “Representing visible locations for qualitative navigation,” in Qualitative Reasoning and Decision Technologies, N. P. Carret’e and M. G. Singh, Eds., 1993, pp. 523–532. [33] A. Cohn and S. Hazarika, “Qualitative spatial representation and reasoning : An overview,” Fundamenta Informaticae, vol. 46, no. 1-2, pp. 1–29, 2001. [34] Y. Yagi, Y. Nishizawa, and M. Yachida, “Map-based navigation for a mobile robot with omnidirectional image sensor copis,” IEEE Trans. Robot. Automat., vol. 11, no. 5, pp. 634–648, 1995. [35] T. Cox and M. Cox, Multidimensional Scaling.
Chapman & Hall/Crc, 2001.
[36] I. Borg and P. Groenen, Modern Multidimensional Scaling: Theory and Applications.
Springer, 1997.
[37] G. Young and A. Householder, “Discussion of a set of points in terms of their mutual distances,” Psychometrika, vol. 3, pp. 19–22, 1938. [38] M. de Berg, M. van Kreveld, M. Overmars, and O. Schwarzkopf, Computational Geometry: Algorithms and Applications. Springer-Verlag, 1997. [39] A. Bowyer, “Computing dirichlet tessellations,” Comput. J., vol. 24, no. 2, pp. 162–166, 181. [40] D. F. Watson, “Computing the n-dimensional delaunay tessellation with application to voronoi polytopes,” Comput. J., vol. 24, no. 2, pp. 167–172, 181. [41] M. Boshra and B. Bhanu, “Predicting performance of object recognition,” IEEE Trans. Pattern Anal. Machine Intell., vol. 22, no. 9, pp. 956–969, 2000. [42] K. B. Sarachik, “The effect of gaussian errors in object recognition,” IEEE Trans. Pattern Anal. Machine Intell., vol. 19, no. 4, pp. 289–301, 1997. [43] M. Lindenbaum, “Bounds on shape recognition performance,” IEEE Trans. Pattern Anal. Machine Intell., vol. 17, no. 7, pp. 666–678, 1995. [44] U. Grenander, A. Srivastava, and M. Miller, “Asymptotic performance analysis of bayesian object recognition,” IEEE Trans. Inform. Theory, vol. 46, no. 4, pp. 1658–1666, 2000.
March 27, 2005
DRAFT
IEEE TRANSACTIONS ON SMC, PART B, VOL. X, NO. X, NOVEMBER X
44
[45] G. Jones and B. Bhanu, “Recognition of articulated and occluded objects,” IEEE Trans. Pattern Anal. Machine Intell., vol. 21, no. 7, pp. 603–613, 1999. [46] M. Pontil and A. Verri, “Support vector machines for 3d object recognition,” IEEE Trans. Pattern Anal. Machine Intell., vol. 20, no. 6, pp. 637–646, 1998. [47] C. Watkins and P. Dayan, “Tchnical note: Q-learning,” Machine Learning, vol. 8, no. 3-4, pp. 279–292, 1992. [48] L. B. Doyle, “Indexing and abstracting by association. part 1,” System Develop Corporation, Tech. Rep. SP-718/001/00, 1962. [49] J. Bollen and R. Luce, “Evaluation of digital library impact and user communities by analysis of usage patterns,” D-Lib Magazine, vol. 8, no. 6, 2002. [50] J. Curiel and G. Radvansky, “Mental organization of maps,” J. Experiment. Psych., vol. 24, no. 1, pp. 202–214, 1998. [51] R. Kitchin and R. Jacobson, “Techniques to collect and analyze the cognitive map knowledge of people with visual impairments or blindness: Issues of validity,” J. Visual Impairment and Blindness, pp. 360–376, July-August 1997. [52] P. Courrieu, “Straight monotonic embedding of data sets in euclidean spaces,” Neural Networks, vol. 15, pp. 1185–1196, 2002. [53] T. Yairi and K. Hori, “Qualitative map learning based on co-visibility of objects,” in Proc. Int. Joint Conf. Artif. Intell., 2003, pp. 183–188. [54] ——, “Extension and evaluation of covisibility-based qualitative map learning method for mobile robots,” in Proc. Int. Conf. Comput. Intell. Robotics Autonom. Syst., 2003. [55] C. Faloutsos and K. Lin, “FastMap: A fast algorithm for indexing, data-mining and visualization of traditional and multimedia datasets,” in Proc. ACM SIGMOD Int. Conf. Manag. Data, 1995, pp. 163–174. [56] T. Fruchterman and E. Reingold, “Graph drawing by force-directed placement,” Software - Practice and Experience, vol. 21, no. 11, pp. 1129–1164, 1991. [57] R. Davidson and D. Harel, “Drawing graphs nicely using simulated annealing,” ACM Trans. Graphics, vol. 15, no. 4, pp. 301–331, 1996. [58] J. Gutmann and K. Konolige, “Incremental mapping of large cyclic environments,” in Proc. IEEE Int. Symp. Comput. Intell. Robotics Automat., 2000. [59] B. Yamauchi, “Frontier-based approach for autonomous exploration,” in Proc. IEEE Int. Symp. Comput. Intell. Robotics Automat., 1997, pp. 146–151. [60] A. Schultz, W. Adams, and B. Yamauchi, “Integrating exploration, localization, navigation and planning through a common representation,” Autonomous Robots, vol. 6, pp. 293–308, 1999. [61] S. Moorehead, R. Simmons, and W. Whittaker, “Autonomous exploration using multiple sources of information,” in Proc. IEEE Int. Conf. Robotics Automat., 2001, pp. 3098–3103.
March 27, 2005
DRAFT
IEEE TRANSACTIONS ON SMC, PART B, VOL. X, NO. X, NOVEMBER X
45
Takehisa Yairi received the B.Eng, M.Eng, and Dr.Eng degrees in aeronautics and astronautics from the University of Tokyo (UT), Japan in 1994, 1996 and 1999, respectively. From 1999 to 2003, he worked as a Research Associate at UT, and became a Lecturer in 2003. His research interests include machine learning theory, spatial cognition of humans and robots, and fault detection / diagnosis systems for spacecrafts using datamining and probabilistic reasoning techniques. He is a member of JSAI, RSJ, and JSASS, Japan.
Koichi Hori (Member, IEEE 4499273) received B.Eng, M.Eng, and Dr.Eng. degrees in electronic engineering from the University of Tokyo, Japan, in 1979, 1981, and 1984, respectively. In 1984, he joined National Institute of Japanese Literature, Japan, where he developed AI systems for literature studies. Since 1988, he has been with the University of Tokyo, Japan. He is currently a professor with Interdisciplinary Course on Advanced Science and Technology, University of Tokyo. From September 1989 to January 1990, he also held a visiting position at University of Compiegne, France. His current research interest includes AI technology for supporting human creative activities, cognitive engineering, and Intelligent CAD systems. Professor Hori is a member of IEEE, ACM, IEICE, IPSJ, JSAI, JSSST, and JCSS.
Kosuke Hirama received the B.Eng. and M.Eng. degrees in aeronautics and astronautics from the University of Tokyo, Japan in 1998 and 2000, respectively. He joined NTT DoCoMo, Inc. in 2000, where he has been engaged in the development of packet-switched core network systems for mobile communications. He is a member of IEICE, Japan.
March 27, 2005
DRAFT