Acquiring a Probabilistic Map with Dialogue-Based Learning - CiteSeerX

0 downloads 0 Views 186KB Size Report
Abstract: This paper describes an experiment where dialogue-based learning is applied to map acquisition of a mobile O ce-Conversant robot. The system ...
Acquiring a Probabilistic Map with Dialogue-Based Learning H. Asoh, Y. Motomura, I. Hara, S. Akaho, S. Hayamizu, T. Matsui

fasoh/motomura/hara/akaho/hayamizu/[email protected] Real-World Computing Project Team Electrotechnical Laboratory 1-1-4, Umezono, Tsukuba, Ibaraki 305 Japan

Abstract: This paper describes an experiment where dialogue-based learning is applied to map acquisition of a mobile Oce-Conversant robot. The system learns the map of environment through simple dialogue with human teachers. A formal probabilistic model is introduced as a representation of map. The importance and the e ectiveness of proper segmentation of spatial-action space and statistical inference using estimated probability distribution on the segmented representation are shown. Key words: Oce-Conversant Robot, Dialogue-Based Learning, Map Acquisition

1 Introduction Learning is an indispensable capability for autonomous intelligent systems which work in real-world environment with complexity and unpredictability. In the course of the Real-World Computing Program we have been building an mobile Oce-Conversant robot which autonomously moves in a real oce environment, actively gathers information and acquires knowledge about the environment through sensing multi-modal data and making dialogue with people[7]. Here our major concern is in \interactive learning using the multi-modal real-world data", and the oceconversant system is one of the platforms to implement ideas and to test its feasibility in a real-world setting. Many schemes and algorithms of learning have been proposed and investigated so far. However many of them are within the area of \learning from examples" where static learning examples prepared in advance are fed into learning systems. We proclaim that the learning system which can behave robustly in real-world environment should exploit more dynamical learning scheme. In other word, the system and the environment or human users should be coupled more tightly. A very powerful communication channel between learning systems and human teachers is dialogue with natural or semi-natural language. Here is a question: \How e ectively the dialogue between systems and humans can be used in learning process or can help the system to learn ?" The idea of \dialogue-based learning" which exploits dialogue in natural language for teaching is rather old. Although the idea is very simple and nat-

ural, the necessity of the capability of speech understanding has been a bottleneck and not so much effort to realize a system has been made. Recently, as a result of the progress of AI and pattern recognition research, the technique of speaker independent continuous speech recognition and natural language understanding has reached applicable level, and the dialogue-based learning are becoming attractive for building real-world oriented autonomous robots. Intimate human-machine interaction can also help the real-world intelligent systems to collaborate with humans in daily-life. In this paper we describe our rst step experiment where the dialogue-based learning was applied to map acquisition task.

2 Dialogue-based Map Acquisition Map acquisition is an important learning task for mobile robots to work in an oce environment. Many researches on the map learning have been done. Typical ones are using the occupancy grid, which discretizes the world into small cells and the systems learn (estimate) whether each cell is occupied with obstacle or not[3][2]. Another method is based on nite state transition networks[1][5][6][11]. Both approaches work well under some assumptions and the e ectiveness is con rmed by several real autonomous mobile systems. However the closely coupled human-machine interaction have not so much considered there. Here we try to apply the idea of dialogue-based learning to the map acquisition task. Our problem is how we can utilize dialogue between humans and robots in the task. We found that designing dialogue scenario with appropriately articulated

action space and introducing probabilistic description of the map are very important to solve the problem.

2.1 Designing Scenario and Action Space Our scenario for the dialogue was designed as the following; (R: robot, H: human teacher) R: Where am I ? H: You are in front of Dr.Nakashima's Oce. R: Where shall I go ? H: Please go to Dr.Hara's Oce. R: Sorry, I don't know how to go to Dr.Hara's Oce. H: OK. Please go straight. R: OK. (go straight till an end-of-action condition being satis ed.) R: Where am I ? H: You are in front of Dr.Matsui's Oce. In designing the scenario, the most important point is designing action elements of the robot. Designing action elements is at the same time designing the articulation of the working space. The following criteria for the design are considered:  User (teacher) can easily designate an elemental action. Commands with metrical information such as \go straight 5 m" are not convenient.  User (teacher) can easily understand and predict the e ect of elemental actions (like \go straight", \right turn" etc.). Oce environment is characterized by its modular structure. Modules with the same appearance consist the whole structure and it is dicult to attribute landmarks to speci c locations. Figure 1 shows the

oor plan of ETL E-building, our eld for experiments. Taking the modularity into account we exploited the compartment type articulation as [8] and implemented as follows. The space is articulated by the end of an elemental action. Each elemental action ends when the end-ofaction condition is satis ed. Tentatively we implemented six elemental actions: go straight, right/left turn by free space following, right/left turn by wall following, turnover and go straight. Adding new elemental actions is possible. End-of-action conditions are tentatively the same for all actions, that is, an action ends when the system moves into OPEN space

from CLOSE space. CLOSE space means hallway between two parallel walls. OPEN space means doorway or intersection of hallways. Setting di erent end-ofaction conditions for each elemental action is possible.

A

B C

D

E

F

Figure 1: Map of ETL E-building To discriminate OPEN space and CLOSE space following features which are invariant for shift and rotation of the system were prepared and a discriminant function was constructed with the simple discriminant analysis method.  The number of detected walls around the system  spatial correlation, maximum, minimum of the sum of two distance values from sonars which are in opposit directions.

2.2 Probabilistic Map The real-world environment is full of uncertainty mainly caused by unexpected noise. A formal probabilistic model was introduced to cope with the uncertainty. Let denote the system's status at time t by D(t) 2 D, S (t) 2 S , and A(t) 2 A, where D(t) is observed sensory information, S (t) is location state of the system which is not directly observable by the system, and A(t) is action executed at time t. Let G represent the goal which the system is tentatively aiming for. A stochastic process < D(t); S (t); A(t); G > is used to describe the whole probabilistic structure of the environment and system's behavior under the condition that the system is running to the goal G. Like many researchers we assumed that the whole stochastic process is a Markov process, that is, the state transition probabilities P (S (t)jS (t 0 1); A(t 0 1)), observation probabilities P (D(t)jS (t); S (t 0 1)), and probabilistic plan P (A(t)jS (t); G) to go to G do not

depend on the other conditions in past time. The equations P (S (t)jS (t 0 1); A(t 0 1)) = P (S (t)jS (t 0 1); :::; S (0); D(t 0 1); :::; D(1); A(t 0 1); :::; A(0); G);

and

P (D(t)jS (t); S (t 0 1)) = P (D(t)jS (t); S (t 0 1); :::; S (0); D(t 0 1); :::; D(1); A(t 0 1); :::; A(0); G);

P (A(t)jS (t); G) = P (A(t)jS (t); S (t 0 1); :::; S (0); D(t); :::; D(1); A(t 0 1); :::; A(0); G) hold for any combination of the conditional part. This assumption implies that the result of an action A(t 0 1) depends only on S (t 0 1). This does not always hold. For example, when the robot enters into a state S (t 0 1), its position and orientation are not uniquely determined. Hence the consequence or action A(t 0 1) may di er depend on the position or orientation. To cancel the e ect and to maintain the assumption potential based motor control is used. This method makes a virtual valley of potential between walls and the system moves along the bottom of the valley. The system normally keeps the center of the corridor as far as there is no large obstacle. Even though there are obstacles, after avoiding the obstacle the system is expected to re-enter the potential valley with high probability and keep the normal position and orientation (see Figure 4). According to the assumption, the probability P (S (t); :::; D(t); :::; A(t 0 1); :::; G) can be computed from P (S (t 0 1); :::; D (t 0 1); :::; A(t 0 2); :::; G) and the above two conditional probabilities. By the de nition of the conditional probability P (S (t); :::; D(t); :::; A(t 0 1); :::; G) = P (D(t)jS (t); :::; D(t 0 1); :::; A(t 0 1); :::; G) 2P (S (t)jS (t 0 1); :::; D(t 0 1); :::; A(t 0 1):::; G) 2P (A(t 0 1)jS (t 0 1); :::; D(t 0 1); :::; A(t 0 2); :::; G) 2P (S (t 0 1); :::; D(t 0 1); :::; A(t 0 2); :::; G) holds, and with the Markov assumption we can show P (S (t); :::; D(t); :::; A(t 0 1)) = P (D(t)jS (t); S (t 0 1)) 2P (S (t)jS (t 0 1); A(t 0 1))

2P (A(t 0 1)jS (t 0 1); G) 2P (S (t 0 1); :::; D(t 0 1); :::; A(t 0 2); :::; G):

In this framework, probabilistic map is de ned as: De nition Probabilistic map is a probability distribution P (S (t); D (t)jS (t 0 1); A(t 0 1)), that is, probability of staying at state S (t) and observing D(t) at time t conditioned by staying at state S (t 0 1) and executing action A(t 0 1) at time t 0 1. The distribution P (S (t); D(t)jS (t01); A(t01)) is computed from two distributions P (S (t)jS (t 0 1); A(t 0 1)) and P (D (t)jS (t); S (t 0 1)). In our tentative system A(t) represents one of the six elemental actions, D (t) represents running distance d(t) and accumulated steering angle a(t) during an elemental action A(t 0 1). If the system asked a question \Where am I?", the answer to the question is added to D(t). S represents position state where the system is. The representation is a kind of probabilistic nite automata. If we can get absolute coordinate information as a part of the observed data D(t), we can easily incorporate the data into the probabilistic map, and the map becomes integrated type of the nite automata and the occupancy grid. To treat the continuous values from odometric sensors, we assume that the distribution P (d(t); a(t)jS (t); S (t 01)) is the product of two normal distributions P (d(t); a(t)jS (t); S (t 0 1)) = ) ( (d(t) 0 djS (t);S(t01) )2 C 2 exp 0 2d2jS (t);S (t01)

( 2 exp 0 (a(t)20  j

a S (t);S (t

2

j

a S (t);S (t

)

01) )2 :

01)

Here djS(t);S (t01) and ajS (t);S(t01) are the mean value, djS(t);S (t01) and ajS (t);S(t01) are the standard deviation of d(t) and a(t) under the condition of S (t) and S (t 0 1). C is a normalization constant. If there are enough number of examples, non-parametric estimation of the distribution is able to be exploited. Current position (state) is estimated as following. The probability P (S (t)jS (t01); D0 (t); A(t01)) is computed from P (D0 (t)jS (t); S (t 0 1)) and P (S (t)jS (t 0 1); A(t 0 1)) with Bayes's formula. P (S (t)jD 0(t); S (t 0 1); A(t 0 1)) = C 2 P (D 0(t)jS (t); S (t 0 1)) 2P (S (t)jS (t 0 1); A(t 0 1)):

Where D 0(t) denotes sensory information without asking questions, C and is a normalization factor. The system estimates its location state as S (t) which attains the maximum value of the probability Pmax = max P (S (t)jD0 (t); S (t 0 1); A(t 0 1)):

2

S (t)

S

If this Pmax is larger than a threshold which is determined previously, the system assumes that he is in S(t) which attains the maximum and executes next action. If Pmax becomes less than threshold which is previously determined, the system makes a con rmation on the current position. To con rm the position system asks teacher a question \Where am I ?". Other con rmation methods such as vision based position recognition can be also introduced. As a consequence, in both cases the system knows where he is with high con dence. It is very useful to avoid the accumulation of location uncertainty. For the path planning, usual shortest path nding algorithm in graph theory (Dijkstra's algorithm) is applied to the probabilistic map. The success rate of the path is evaluated and path with a low success rate is discarded.

Map Planning Speech Recog− nition

We use a Nomad 200 as a mobile robot platform (see Figure 2 and 5). It is equipped with 16 x sonar distance sensors, 16 x infrared proximity sensors, touch sensors, a compass, odometric sensors which measure running distance and steering angle, and communicates with external host computer (4 CPU Sparc Station 20) via a radio Ethernet link. All control software modules shown in Figure 3 is written in C language using robot control library provided by Nomadic Co. and run on the external host. TV Chamera (Tentatively not used)

Plan

Map Correction

Plan Execution

Status Recognition

Language Input

Output String Generation Language Output

Mode Recognition

Oodmetry Compass Angle

Free Space Following Wall Following

Soner x 16 InfraRed x 16

Obstacle Avoidance

Bumper

3 System and Experiment

Radio Ethernet Analog UHF

As is mentioned above, a potential based free space following and a wall-following are used to implement elemental actions. An example trace of moving robot with reports from sonar sensors during the move is shown in Figure 4. The robot is moving from point \B" in Figure 1 to point \D". For the dialogue control we tentatively implemented very simple pattern-matching based module.

Velocity Angular Velocity

Collision Avoidance/ Escape

Figure 3: Software Modules

B C

Microphone etlrwcs>

Soner x 16 Compass Super Sparc x 4 Solaris 2.4

PC/AT(i486) Linux OS 12V x 60 Ah

host Infrared x 16

D Obstacles

Bumper Distance/ Angle Velocity/ Acceleration Nomad 200

Figure 2: Hardware Con guration

Figure 4: Trace of Robot with Sonar Reports (from \B" to \D" in Figure 1)

is shown in appendix. In each row, the start position, the goal position, the elemental action executed, mean and variance of distance and angle change, the number of trials, and the number of successes are represented. Position is denoted by the label in Figure 1. For example, \A:S" means staying position \A" in Figure 1 facing the south (the right in Figure 1). Distance is in inch and angle is in degree. To discriminate the direction which the system faces, the compass is used and currently four directions, north (face the left in Figure 1), east, west, and south are discriminated. The map is modi ed after execution of an elemental action. For example, let the system has been at \A:S". Then he executed \go straight" action and arrived at a novel state. He asks \Where am I ?" because he has no experience of the move \go straight from A:S". If the teacher answers to him \B", he know the place is \B:S" combining the answer with signal from compass, and one new row which describes the move is added to the map. If the same move as the system has done in the past experiences was executed, an existing row describing the move is modi ed. Noise e ect may make the robot halt in unexpected position (e.g. non-branching point). In such cases the teacher tells the robot a special state name \on the way". This state is not used in the path planning. This trick is rather e ective to discard confusing information.

Figure 5: Field of the Experiment (on the way from \D" to \C") Teacher's speech collected by a microphone on Nomad 200 is sent to the host with analog UHF transmitter. A real-time continuous speaker-independent speech recognition module developed in ETL[4] analyses the speech signal and extracts key patterns such as \Dr.Nakashima" or \go straight". Experiments are made on a oor of our laboratory's building (see Figure 1 and 5). Our system starts with no map information and acquires a probabilistic map through dialogue with human teachers. An example map acquired is shown in Table 1 which is acquired by moving in the corridor between \A" to \F" in Figure 1. A part of the dialogue for the acquiring the map

Start B:S C:S D:S E:S F:S E:N D:N C:N B:N A:N C:S E:S C:N C:S D:S C:N B:S

Goal Action    C:S go straight 144.5 3.9 10.5 D:S go straight 689.0 18.0 -8.3 E:S go straight 169.0 41.0 -19.6 F:S go straight 587.3 74.9 7.4 E:N turnover and go straight 494.3 17.6 165.7 D:N go straight 88.0 31.2 26.5 C:N go straight 779.0 24.8 -3.2 B:N go straight 98.7 72.5 -3.9 A:N go straight 644.7 37.9 -1.6 B:S turnover and go straight 476.3 6.9 172.7 B:N turnover and go straight 72.0 21.0 -167.3 E:S go straight 16.0 0.0 -19.0 A:N go straight 1375.0 0.0 5.4 B:N turnover and go straight 61.3 11.9 169.7 C:N turnover and go straight 529.7 16.4 174.3 D:S turnover and go straight 507.0 0.0 -179.6 A:N turnover and go straight 475.0 0.0 -179.4 Table 1: An Example of Acquired Map (See also Figure 1.) d

d

a

a 4.3 4.2 2.2 0.5 8.2 50.3 3.1 3.8 3.2 12.1 1.3 0.0 0.0 3.5 2.5 0.0 0.0

Success T rial 6 6 5 5 3 3 3 4 3 3 3 3 3 3 4 5 3 3 7 7 2 2 1 4 1 5 3 3 3 3 1 1 1 1

This small map already includes some interesting events. First we can see that in some rows standard deviation d or a is very large. For example, in row 4 and row 8, d is 74.9 and 72.5 respectively. These large values suggest slip of wheels have happened or tentative obstacles (e.g. humans) have disturbed the system's path. Row 8 and row 13 show that the system missed point \B:N" once in ve trials. Row 12 shows that the end-of-action condition happened to be satis ed by mistake. Using probabilistic map, we can keep these kinds of probabilistic information on environment and utilize it for path planning and location estimation.

4 Discussion The feasibility of the dialogue-based learning is con rmed. Although the dialogue capability of the system is very limited, the system can learn a map without much e ort of human teachers. From our experiment we got some insights about the feature of the dialoguebased learning. Dialog-based learning has following advantages over sensor-based learning:  On the job/demand supply of information The system is always in learning mode and ask question autonomously. Combined with the context information which both teacher and the system have in common, simple dialogue can cause rather large e ect.  To enable purposive design of dialogue Because the context of the dialogue is limited to a speci c learning task, the dialogue with rather small size vocabulary and grammar can work well. To cope with multiple learning targets switching small grammars and dictionaries is expected to be e ective.  To become intimate with the system The teaching e ort is strongly motivated by the close interaction with the system. Estimating the knowledge status of the system becomes easy.  To provide structural information Information which has complex structure such as path plans be conveyed through semi-natural language dialogue. The following points are possible defects.  Frustration caused from the limited communication capability  Teaching is a time consuming job.

The importance of the probabilistic model is also con rmed. The major advantages of maintaining the probability information are;



The system can re ectively assess the certainty of beliefs and autonomously make questions.



The system can learn stochastic features of the environment such as frequently occupied or very slippery hallway.



The system becomes noise or mistake persistent.

5 Related Work The state transition automata map description is introduced by Kuipers and Byun[5] and used by some other researchers[6][11]. They are not probabilistic ones. Simmons and Koenig[10] proposed a probabilistic map which is very similar to ours. The main difference between our map and theirs is in the design of S , A, and D . They used the assumption of perpendicularly crossing corridors and implemented three elemental actions: turning right 90 degree, turning left 90 degree, and going forward one meter. As a consequence their position state S is located every one meter. We think our design is a little more exible because it can cope with free layout of corridors, and more suitable to be combined with dialogue. As for D, we treat continuous valued information while Simmons et al. treated discreet distance. Our design of S and A is inspired by the compartment type quantization of the oce environment used for the mobile robot DERVISH[8]. As for the introduction of dialogue, well known intelligent robot \SHAKEY" developed by Nilsson is one of the earliest implementation of the dialoguebased learning[9]. Mark Torrance has implemented a mobile robot which learns map through natural language dialogue[12]. Basye and Dean investigated the learnability of probabilistic map[1]. They pointed out that avoiding the accumulation of uncertainty is very important for successful learning. Simmons and Koenig also reported the problem of the computational complexity in using their probabilistic map[10]. We shown that by introducing the dialogue, the accumulation can be very e ectively avoided. The main contribution of our work is combining the formal probabilistic map, the idea of dialogue-based learning, and real-time continuous spoken language understanding.

6 Conclusion and Future Work

Acknowledgement

We successfully applied dialogue-based learning to the map acquisition in the mobile autonomous robot navigation. The implemented system can make simple dialogue with human teacher with spoken semi-natural language and acquire a probabilistic map of the environment. The implemented system has many points to be improved. The most important one is reduction of the number of queries and teaching time. We tentatively assumed that the system can get the response from teacher anytime. However in real situations, there may be the case that no person is around the system. To cope with the point we are now planing:

This work is supported by the Real-World Computing Program organized by the Ministry of International Trade and Industry of the Japanese Government. We thank the members of real-world computing project team and AIUEO for helpful discussion. Potential method algorithm for obstacle avoidance is originally programmed by Alex Zelinsky and Yasuo Kuniyoshi.

 Integrating with vision based landmark tion

naviga-

 Introducing reasoning capability on the map

For example, in the system of Torrance, reversing the acquired path is enabled. We are also planing reconstruct two dimensional bird's-eye view map from the topological state transition structure and the distance between neighboring states.

 Dividing the problem into learning state transi-

tions structure and learning states' name After autonomous learning of the structure of state transition the system inquires name of the state to human teachers.

As for the capability of dialogue, we are planing to implement a semantic analysis mechanism. To realize comfortable dialogue, the grammar and the dictionary should be tuned based on the cognitive science research on the man-machine dialogue. Improvement of the speech recognition module to cope with the realworld environment with much noises is also necessarily. Application of the dialogue-based learning to other tasks is an interesting research issue. In the project of building the Oce-Conversant robot, we are planning to apply the learning strategy to many other targets. Design of appropriate action-space is the most important factor for successful learning. However the design is heavily task and environment dependent. In this research action-space is designed and implemented by human. In order to make dialog-based learning more e ective, self-organization of action space will become an important and interesting research issue in our next development.

References [1] K. Basye, T. Dean, and J.S. Vitter: Coping with uncertainty in map learning. In Proceedings of the IJCAI-89, 663-668, 1989. [2] J. Buhmann: The mobile robot RHINO. AI MAGAZINE, Summer 1995, 31-38 (1995). [3] A. Elfes: Multi-source spatial data fusion using Bayesian reasoning, in M.A.Arbib and R.C.Gonzales eds., Data Fusion in Robotics and Machine Intelligence, 137-163 Academic Press (1992). [4] K. Itou, S. Hayamizu, K. Tanaka, H. Tanaka: System design, data collection and evaluation of a speech dialogue system, IEICE Transactions on Information & Sytems, E76-D, 121-127 (1993). [5] G.J. Kuipers and Y.-T. Byun: A robust, qualitative method for robot spatial reasoning. In Proceedings of the AAAI-88, 774-779, 1988. [6] M. Mataric: Integration of representation into goal-driven behavior-based robot, IEEE Transactions on Robotics and Automation, 8, 304-312 (1992). [7] T. Matsui, S. Hayamizu, H. Asoh, I. Hara, and Y. Motomura: Oce-Conversant mobile robot project, Proc. of 1995 Annual Conference of the Robotics Society of Japan (1995) (in Japanese). [8] I. Naurbakhsh, R. Powers, and S. Birch eld: DERVISH An oce-navigating robot, AI MAGAZINE, Summer 1995, 53-60 (1995). [9] J. Nilsson: A mobile automaton: an application of arti cial intelligence techniques, Proc. of the First IJCAI, 509-520 (1969). [10] R. Simmons and S. Koenig: Probabilistic robot navigation in partially observable environments. In Proceedings of the IJCAI-95, pp.1080-1087, 1995. [11] J. Tani: Dynamical systems approach in learnable autonomous robots. Proc. IIW-95, 241-249 (1995).

[12] M.C. Torrance: Natural Communication with Robots. Master Thesis, Stanford University Department of Electrical Engineering and Computer Science (1991).

Appendix: A part of the dialogue for acquiring the map shown in Table 1 R: Where am I ? H: You are in front of Dr.Matsui's Oce (B:S). R: Where shall I go ? H: Please go to Dr.Hara's Oce (F:3). R: Sorry, I don't know how to go to Dr.Hara's Oce. H: OK. Please go straight. R: OK. (go straight till an end-of-action condition being satis ed) R: Where am I ? H: You are in front of Dr.Onda's Oce (C:S). R: Where shall I go ? H: Go straight more. R: OK. (go straight again till an end-of-action condition being satis ed) R: Where am I ? H: You are in front of Dr.Ogasawara's Oce (D:S).

... R: Where am I ? H: You are in front of Dr.Matsui's Oce (B:S). R: Where shall I go ? H: Please go to Dr.Ogasawara's Oce (D:3). R: OK. (go straight and arrive at C:S) R: Where am I ? H: You are in front of Dr.Onda's Oce (C:S). R: OK. I'm on the right way. (go straight and arrive at D:S) R: I'm at Dr.Ogasawara's Oce (D:S). R: Where shall I go ? H: Go to Dr.Matsui's Oce (B:3). R: OK. I'll turnover and go straight. (miss B and stop at A:N.) R: Oh I'm lost. Where am I ? H: You are in front of Dr.Nakashima's Oce (A:N). R: Thank you. I'll turnover and go straight. (replan the path to Dr.Matsui's Oce and execute it.) R: I'm at Dr.Matsui's Oce (B:S). ... (Here A:3 means A:N or A:S.)

Suggest Documents