ACTIVE ROBOTIC SENSING AS DECISION MAKING WITH STATISTICAL METHODS
L. Mihaylova *, T. Lefebvre, H. Bruyninckx, J. De Schutter *Universiteit Gent, SYSTeMS group, B-9052 Zwijnaarde, Gent, Belgium, E-mail:
[email protected]
Katholieke Universiteit Leuven, Dept. of Mechanical Engineering, Celestijnenlaan 300B, Belgium
Abstract:
Active robotic sensing is a large field aimed at providing robotic systems with tools and methods for decision making under uncertainty, e.g. in a changing environment and lack of sufficient information. Active sensing (AS) incorporates the following aspects: (i) where to position sensors, (ii) how to make decisions for next actions in order to extract maximum information from the sensor data, and minimize costs such as travel time and energy. We concentrate on the second aspect: “Where should the robot move at the next time step?” and present AS in a probabilistic decision theoretic framework. The AS problem is formulated as a constrained optimization, with a multiobjective criterion combining an information gain and a cost term with respect to the generated actions. Solutions for AS of autonomous mobile robots are given, illustrating the framework.
Key words:
active sensing, autonomous robots, data fusion, estimation, decision making, information criterion, Bayesian methods
1.
INTRODUCTION
For a long time the goal of building autonomous robotic navigation systems has been central to the robotics community. In order to perform different tasks, autonomous robots need to move safely from one location to another. This is only possible when the robot is equipped with sensors, e.g.
2
L. Mihaylova *, T. Lefebvre, H. Bruyninckx, J. De Schutter
cameras, encoders, gyroscopes, contact or force sensors. To perform a task, the robot first needs to know: “Where am I now?” After that, the robot needs to decide “What to do next?” and to perform a sequence of actions. The latter decision making process is called active sensing (AS). The active sensing, active perception paradigm is introduced by Bajcsy [1, 2], Aloimonous et al. [3], and Ballard [4] in a context of a task-directed choice of the controllable parameters to a sensing system. The developed methods aim at adaptively changing camera parameters (e.g. positions, focus or aperture), and at efficient data processing in order to improve perception. Either the sensor parameters or the processing resources allocated to the system are controlled [5]. An action is a particular kind of event leading to a change in the robot state or in the state of the world. The states capture all the information relevant to the robot decision-making process. Previewing both immediate and long-term effects is a requisite for choosing actions: the robot should take both actions to bring itself closer to its task completion: e.g. reaching a goal position within a certain tolerance, and actions for the purpose of information gathering, such as searching for a landmark, surrounding obstacles, reading signs in a room, in order to keep its uncertainty small enough at each time instant. The robot should then be able to deal with static as well as with unknown and dynamic obstacles (e.g. moving people) and in this way perform quickly changing tasks in a quickly changing environment. Examples are: mobile robot navigation where the robot has to move safely and quickly under uncertainty; industrial robot tasks in which the robot is uncertain about the positions and orientations of its tools and work pieces, e.g. drilling, welding, polishing [6]; vision applications: active selection of camera parameters such as focal length and viewing angle to improve object recognition procedures [7, 8]. Active vision (or active sensing in general) refers to the control of sensing parameters to improve the robustness of the feature extraction process. Active vision applications can be divided into four major classes [5]: active vision, active perception, animate vision, purposive vision. Active vision introduced by Aloimonous et al. [3] is a mathematical analysis of complex problems such as stability, linearity and uniqueness of solutions. A large group of active vision methods focus on the search for a solution in the robot configuration space (the space describing all possible positions of the robot). Other methods [9] center their work on the image feature space. According to the position of the camera, methods may employ a stationary camera head or a moving camera around the object so as to look at it from different viewpoints, the so-called viewpoint planning [10]. The goal of active perception as defined by Bajcsy [2] is to elaborate strategies for setting sensor parameters in order to improve the knowledge of the environment.
Active Robotic Sensing as Decision Making with Statistical Methods
3
Animate vision [4] is based on the analysis of human perception. The aim of purposive vision is to extract from the environment the information needed to ensure the realization of the task at hand. However, visual information includes uncertainty caused by quantization or calibration errors. In addition, visual processing is costly because the amount of image data is large, and because of the relatively complicated reasoning involved. When navigating using visual information, the robot has to reach a reasonable compromise between safety and efficiency [11]. If the robot reduces the number of observations in order to move fast, this can lead to cumulative motion uncertainty. On the other hand, the augmentation of the number of observations leads to safer motion, but arrival at the goal configuration will be delayed.
1.1
Estimation, control and active sensing
The inherent modules of an intelligent sensing system are estimation, control and active sensing. The estimation part is presented by stochastic estimators, which are based on the sensor and robot models, and after fusing the sensor data, generate estimates of the robot states and parameters. Knowing the desired task, the controller is charged with the task completion as accurately as possible. Motion execution can be achieved either by feedforward commands, feedback control or a combination of both [12]. Next, active sensing (AS) is the process of determining the inputs by optimizing an optimality criterion. These inputs are then sent to the controller [13,6,14]. AS is a decision making process, made at each time instant, or delayed to some time period [15], or performed upon request, after processing data from one or more sensors [16,17] through multi-sensor data fusion. AS is challenging for many reasons: nonlinearity of the robot and sensor models; need of an optimality criterion able to account for information gain and other costs (such as travel distance, distance to obstacles); the high computational load (time, number of operations), especially important for on-line tasks; uncertainties in the robot model, the environment model and the sensor data; often measurements do not supply information about all variables, i.e. the system is partially observable; geometric sensing problems [17] such as those requiring a description of the shape, size and position of objects. The rest of the paper is organized as follows. Section 2 formulates the AS problem within a statistical framework and considers the most often used optimality criteria for information extraction. Section 3 presents the main groups of optimization algorithms for AS. Section 4 gives examples and finally, Section 5 terminates with conclusions.
4
2.
L. Mihaylova *, T. Lefebvre, H. Bruyninckx, J. De Schutter
PROBLEM FORMULATION
Active robotic sensing can be considered as a trajectory generation for a stochastic dynamic system described by the model
x k +1 = f (x k , u k ,η k ),
(1)
z k +1 = h(x k +1 , s k +1, ξ k +1 ),
(2)
where x is the system state vector, f and h are in general nonlinear system and measurement functions, z is the measurement vector, η and ξ are, respectively, system and measurement noise (additive or multiplicative), with covariances Q k and R k . u denotes the input vector of the state function (e.g. the robot speed), s stands for a sensor parameter vector as input of the measurement function (an example is the focal length of a camera). Subscript k denotes the time step. Further, we denote both u and s inputs to the system with a (actions). The sensors introduce uncertainties due to statistical errors (usually well modeled by probability measures) and quantization errors (friction or other uncertainties that are more difficult to model via statistical methods) [17]. A multi-objective performance criterion (often called value function) is needed to quantify, for each sequence of actions a1 ,…, a N (also called policy), both the information gain and some costs in task execution:
α jU j +
J (x, z ) = min { a1 ,…,a N
j
βl Cl }.
(3)
l
This measure is general and appropriate for almost all sensing tasks, without assumptions as to particular sensing modality or the task at hand. It is composed of a weighted sum of rewards: (i) terms U j characterizing the minimization of expected uncertainties (maximization of expected information extraction) and (ii) terms Cl denoting other expected costs, such as the travel distance, time, energy, distances to obstacles, distance to the goal. Both U j and Cl are function of the policy a1 ,…, a N . The weighting coefficients α j and βl give a different impact to the two parts, and are chosen by the designer according to the task context. When the state at the goal configuration fully determines the rewards, U j and Cl are computed based on this state only. When attention is paid to both the goal configuration and the intermediate time evolution, the terms U j and Cl are functions of the robot state at different time steps k .
Active Robotic Sensing as Decision Making with Statistical Methods
5
Criterion (3) is to be minimized with respect to the sequence of actions under constraints c(x1 ,…, x N , a1 ,…, a N ) ≤ cthr , where c is a vector of physical variables that cannot exceed some threshold values cthr , e.g. maximum allowed velocities and acceleration.
2.1
Action sequence
The description of the actions a1 ,…, a N can be given in the configuration space in different ways and has a major impact on the optimization problem that will be solved afterwards (Section 3). Two major groups of methods can be distinguished: with parameterized and nonparameterized sequence of motions. Within the framework of the parameterized sequence of actions the AS problem is reduced to a finite-dimensional parameter optimization. The robot trajectory is considered as composed of primitives (finite sine/cosine series [12], elliptic or other functions with appealing properties) the parameters of which are searched for in order to satisfy an optimality criterion. The choice of “where to look next” can be treated as a case of an optimal experiment design [6]. The methods for nonparameterized actions generate a sequence of freely chosen actions that are not restricted to a certain form of trajectory [18]. Constraints, such as maximum acceleration and maximum velocity, can be added to produce executable trajectories. A general framework for dealing with uncertainties involves optimization problems using Markov decision processes (MDPs) and partially observable MDPs (POMDPs) [19,20]. Often used are probabilistic metrics as a decision making criterion, e.g. the Shannon entropy [21]. However, computational complexity makes the POMDPs intractable for systems with many states. A mobile robot operating in the real world may have millions of possible states. Hence, exact solutions can only be found for (PO)MDPs with a small number of states. Larger problems require approximate solutions, like [22], and hierarchical POMDPs. Both parameterized and nonparameterized methods can generate locally and globally optimal trajectories depending on the length of the path along which the optimization is performed, i.e. the trajectory is optimal in some segments with respect to the performance criterion or along the whole path. Optimal is the action that continuously directs the vehicle towards the maximum increase of information.
2.2
Performance criteria related to uncertainty
Considered in a stochastic framework, the outcome of an action is a random change in the robot state. The outcome of an action can be
6
L. Mihaylova *, T. Lefebvre, H. Bruyninckx, J. De Schutter
characterized by the terms U j that represent: the expected uncertainty about the state or this uncertainty compared to the accuracy needed for the task completion. Due to different uncertainties, a natural tool for AS is the Bayesian framework, in which the characterization of the accuracy of the estimate is based on a scalar loss function of its probability density function. Different stochastic estimators can be applied for calculating the term U j , such as the standard Kalman filter (KF), extended or iterated KFs [23,6], the unscented KF [24], and Monte Carlo techniques [25]. Since no scalar function can capture all aspects of the information extraction, no function suits the needs of every experiment. Commonly used functions are based on the covariance matrix of a stochastic estimator. The covariance matrix P of the state vector x is a measure for the uncertainty of the estimate. AS is looking for actions which minimize the uncertainty, i.e. the covariance matrix P or the inverse of the Fisher information matrix I = P −1 [26,27]. Several scalar functions of a covariance matrix can be used [28]: (i) Doptimal design: minimizes the matrix determinant, det (P ) , or the logarithm of it, log (det (P)) . It is invariant to any nonlinear transformation of the state vector x with a non-singular Jacobian, but is not invariant to physical units (when the elements of x have different units: as meters for position, degrees for angles), nor is it suitable for verifying whether the task has been completed. det ( P ) being smaller than a certain value does not guarantee that the covariances of the state variables will be smaller than their toleranced value. (ii) A-optimal design: minimizes the trace tr (P) . Aoptimal design does not have the invariance property when states have inconsistent units, nor does this measure allow to verify the task completion. (iii) L-optimal design: minimizes the weighted trace tr ( WP ) . A proper choice of the weighting matrix W = MN can render the L-optimal design criterion invariant to transformations of the state vector x with a nonsingular Jacobian. The covariance matrix P is normalized and scaled [6,29]. The product of the normalizing matrix N and P is invariant to physical units. The scaling matrix M scales the elements of the product within a preset range. Tolerance-weighted L-optimal design [6] proposes a natural choice of W depending on desired tolerances of task completion. (iv) Eoptimal design: minimizes the maximum eigenvalue λmax (P) . Like Aoptimal design, this function is not invariant to transformations of x , but it allows for the verification of task completion. The second large group of information functions are based on a probability density function p ( x) . The Shannon entropy [21]
H [ p(x)] = −
∞ −∞
p (x)log ( p(x))dx
(4)
Active Robotic Sensing as Decision Making with Statistical Methods
7
gives the average information or the uncertainty of a random variable. Introduced to quantify the transmission of information in communication channels [21], the entropy is successfully applied in many fields, including robotics and computer vision. In vision, entropy is used to qualify the view of a scene and determines the next best view (the one that obtains maximum information of a scene) [30]. In light of the AS process, entropy can characterize the robot’s knowledge about its location in the environment. Entropy based performance criteria are: – the entropy of the posterior distribution p post (x) : E[− log p post (x)] , where E[.] indicates the expected value. – the change in entropy between two distributions: p1 (x) , the prior and p2 (x) , the posterior: E[− log p2 (x)] − E[− log p1 (x)] . – the Kullback-Leibler distance [31] (also called relative entropy) is a measure for the goodness of fit or closeness of two distributions: E [log( p 2 (x) / p1 (x))] . The expected value is calculated with respect to p2 (x) . The relative entropy and the change in entropy are different measures. The change in entropy only quantifies how much the form of the probability distributions changes whereas the relative entropy also represents a measure of how much the distribution has moved. If p1 (x) and p2 (x) are the same distributions, translated by different mean values, the change in entropy is zero, while the relative entropy is not. The computation of the probability density and entropy can be performed by Monte Carlo sample-based stochastic estimators [25,13,14].
3.
OPTIMIZATION ALGORITHMS FOR ACTIVE SENSING
Active sensing corresponds to a constrained optimization of J with respect to the policy a1 ,…a N . Depending on the robot task, sensors and uncertainties, different constrained optimization problems arise. If the sequence of actions a1 ,…a N is restricted to a parameterized trajectory, the optimization can be expressed as: linear programming, constrained nonlinear least squares methods, convex optimization, etc. [32]. Examples of problems formulated as optimization with respect to a set of parameters are dynamical robot identification [27] and generation of a sinusoidal mobile robot trajectory [29], where the solution is searched for within the optimal experiment design. If the sequence of actions a1 ,…a N is not restricted to a parameterized trajectory, then the optimization problem has a different structure and is within the Markov Decision Processes (MDPs) framework [18]. MDPs serve
8
L. Mihaylova *, T. Lefebvre, H. Bruyninckx, J. De Schutter
as a background for solving complex problems with incomplete information about the robotic system. A Markov decision process can be described [18,22] as a tuple X , A, Pr , R where X = {x(1) , x (2) ,…, x ( N ) } is a finite set of states of the system which evolves stochastically; A is a finite set of actions; Pr : X × A → Π ( X ) is a state-transition function, mapping an action and a state to a probability distribution over X for the possible resulting state. The Markovian transition probability Pr (x′ | x, a) represents the probability of going from state x to state x′ with an action a . To judge the quality of an action, a reward function is introduced, R : X × A → R . It gives the immediate reward obtained by the agent (decision maker) in state x after taking an action a . The next state and the expected reward depend only on the previous state and the action taken (Markov property). A policy for a MDP is a mapping π : X → A that selects an action for each state. Given a policy, a finite-horizon value function of the state can be defined such that Vnπ : X → R , where Vnπ ( x) is the expected value of applying the policy π for n steps starting in x . Then a value function can be inductively written with V0π (x) = R(x, π (x)) and
Vmπ (x) = R (x, π (x)) +
x′∈ X
Pr (x, π (x), x′)Vmπ−1 (x′).
A policy π is considered to be better than a policy π ′ , if for all x ∈ X , V (x) ≥ V π ′ (x) , and for at least one x ∈ X , V π (x) > V π ′ (x) . This means that a policy is optimal if it is not dominated by another policy. Consider the following optimization problems that require solving in the MDPs: a finite-horizon, i.e. over a fixed finite number of time steps ( N is finite), or an infinite-horizon problem ( N = ∞ ). For every state it is rather straightforward to know the immediate reward being associated to every action (1 step policy) The goal however is to find the policy that maximizes the reward over the long-term (N steps). Different optimization procedures exist for these kinds of problems, the most popular of which are: • Value iteration: it is a dynamic programming algorithm, that recursively calculates the optimal value function and policy [33]. The optimization is formulated as a sequence of problems to be solved with only one of the N variables ai . • Policy iteration: is an iteration technique [34] over policies for infinite horizon systems. The current policy is improved repeatedly. The initial policy is chosen at random, and the process terminates when no improvement can be achieved. • Linear programming approach: formulates and solves a MDP as a linear program. In practice, policy iteration tends to be faster than the linear programming approach. π
Active Robotic Sensing as Decision Making with Statistical Methods •
9
State based search methods: represent the system as a graph whose nodes correspond to states. Tree search algorithms then search for the optimal path in the graph and can handle finite and infinite horizon problems [22].
In the fully observable MDPs the agent accurately knows what state it is in at each instant. When the information about the system state is incomplete or noisy, the solution is searched for in the group of partially observable MDPs (POMDPs). At each time step the state of the system is not known, only a probability distribution over the states can be calculated. An optimal policy for every possible probability distribution at each time step is needed. This constitutes the key problem of this representation. In most real applications the set of states is quite large and the (PO)MDPs representation is computationally expensive and not feasible. Making approximations is the only way to apply these algorithms to real systems. Exact solutions can only be found for (PO)MDPs with a small number of (discretized) states. For larger problems approximate solutions are needed, e.g. [22]. Solutions to POMDPs are usually obtained by applying dynamic programming or by solving directly the Bellman equation [33]. In the search for more elegant schemes, hierarchical POMDPs are proposed. Bayesian networks are other tools allowing compact representation of the transitions.
4.
EXAMPLES
Results from a covariance-based parameterized approach for trajectory generation [29] and an entropy-based method [13] are presented. Example 1. Distance and orientation sensing of a mobile robot to known beacons is addressed [29]. We consider the trajectory generation of a nonholonomic wheeled mobile robot (WMR), moving from a starting T configuration ( x s , y s , φ s ) (position and orientation) to a goal configuration T x g , y g , φ g , around a known nominal reference trajectory ( xr , k , yr ,k , φr , k )T . The vehicle motion is described by the model
(
)
xk +1 yk +1
φk +1
xk + vk ∆Tcos (φk + ψ k )
η x,k = yk + vk ∆Tsin(φk +ψ k ) + η y , k , v ∆T ηφ ,k φk + k sinψ k L
10
L. Mihaylova *, T. Lefebvre, H. Bruyninckx, J. De Schutter
with xk and yk the WMR position coordinates relative to a fixed frame (Fig. 1 (a)), and φk the orientation angle with respect to the x axis. They form the state vector x k = ( xk , yk , φk )T . L represents the wheelbase (the distance between the front steering wheel and the axis of the driving wheels), ∆T is the sampling interval, ηk = (η x , k ,η x , k ,ηφ , k )T is the process noise. The WMR is controlled through a desired velocity vk and a direction of travel ψ k , written into the control vector u k = (vk ,ψ k )T . Due to physical constraints, vk and ψ k cannot exceed boundary values: vk ∈ [0, vmax ] , ψ k ∈ [−ψ max ,ψ max ] (ψ max ≤ π2 ). The WMR can only perform forward motions. The vehicle is equipped with a sensor measuring the range rk and bearing θ k to a beacon B , located at known coordinates, ( xB , yB )T . The observation equation for the beacon is
rk
θk
=
( xB − xk )2 + ( yB − yk )2 ξ r ,k + , y B − yk ξθ ,k arctan( )−φk xB − xk
where z k = (rk , θ k )T is the measurement vector and ξ k = (ξ r , k , ξθ ,k )T is the observation noise. ηk and ξ k are assumed Gaussian, zero mean, mutually uncorrelated, with covariances Q k and R k , respectively.
Figure 1. (a) WMR coordinates (b) Trajectory in the presence of multiple obstacles
The beacon location with respect to the WMR is of paramount importance for the AS task, for the accuracy and informativeness of the data. The optimal trajectory is searched [29] in the class Q (p) of harmonic functions, where p is a vector of parameters obeying preset physical constraints. With N the number of functions, the new (modified) robot trajectory is generated on the basis of a reference trajectory by the lateral
Active Robotic Sensing as Decision Making with Statistical Methods
11
deviation lk (lateral is the orthogonal robot motion deviation from the reference trajectory in y direction) as a linear superposition
lk =
N i =1
Ai sin(iπ
sr , k sr ,total
),
(5)
of sinusoids, with constant amplitudes Ai , sr , k is the reference path length up to instant k , sr ,total is the total path length, and r refers to the reference trajectory. In this formulation AS is a global optimization problem (on the whole robot trajectory) with a criterion
J = min{α1U + α 2C} Ai ,k
(6)
to be minimized under constraints (for the robot velocity, steering angle, orientation angle, distance to obstacles). α1 and α 2 are dimensionless positive weighting coefficients. Here U is in the form
U = tr ( WP ),
(7)
where P is the covariance matrix of the estimated states (at the goal configuration), computed by an unscented Kalman filter [24]. W is a weighting matrix. The cost term C is assumed to be the relative time
C = ttotal /tr ,total ,
(8)
where ttotal is the total time for reaching the goal configuration on the modified trajectory, and tr ,total is the respective time over the reference trajectory. The weighting matrix W is presented as a product of a normalizing matrix N , and a scaling matrix M , W = MN , with the matrix N = diag{1/σ 12 ,1/σ 22 ,…, σ n2 } . σ i , i = 1,…, n , are assumed here to be the standard deviations at the goal configuration on the reference trajectory and M is the unit matrix. Simulation results for a straight line reference trajectory, and the modified trajectory, generated with different number of sinusoids N (in accordance with (5), and criterion (3) with terms (7) and (8)) together with the uncertainty ellipses are shown in Fig. 2 a). The evolution of the weighted covariance trace is presented in Fig. 2 b). The multisine approach gives higher accuracy, than the straight-line trajectory. As it is seen from Figures 2 a), b) the most accurate results at the goal configuration for U are obtained with N = 5 sinusoids. Better accuracy is provided with bigger N , at the
12
L. Mihaylova *, T. Lefebvre, H. Bruyninckx, J. De Schutter
cost of increased computational load. The plot in Fig. 1 (b) gives a more complex robot trajectory in an environment with many obstacles and landmarks. The robot trajectory in this storehouse is generated as composed of different segments, three in this case (Fig. 1 (b)). Even complex and long trajectories can be reduced to a sequence of straight line segments. The end conditions of the first segment are initial conditions for the second segment and similarly for the other intermediate linking points. The criterion U is in the form (7). In order to insure informative sensor measurements, each of the three segments is provided with two beacons. The minimum distance to obstacles should not be less than 0.5m . The generated trajectory (with N = 3 ) and the 3σ uncertainty ellipses around it are plotted in Fig. 1 (b). The robot moves in the direction of increasing information (toward the beacons), thereby decreasing the uncertainties, which is demonstrated by the changing size of the uncertainty ellipses. The trajectories generated by the parameterized sequence of actions are smooth, always obeying the position constraints at the starting and goal configurations. The multisine approach is aimed at applications where enough freedom of motion is available, e.g. in large indoor and outdoor environments. No considerable gain will be reached in very constrained environments, such as very narrow corridors or very cluttered environments. The fewer the constraints, the more effective the approach. This is obvious when we account for the fact that the lateral deviation is bounded by a threshold.
Figure 2. Robot trajectories and information criterion: a) Trajectories, generated with (5), b) Evolution of trace(WP) (8) in time, and different number N. Uncertainty ellipses plotted around trajectories
Example 2. This example illustrates AS based on entropy and cost minimization for the Minerva mobile robot [13], Fig. 3 (a), with positional uncertainty. The technique [13] is called coastal navigation by analogy to the navigation of ships. Ships often use the coastal parts of the land to
Active Robotic Sensing as Decision Making with Statistical Methods
13
determine where they are, when other advanced tools such as GPS systems are not available. Similarly, mobile robots need to position themselves in dynamic environments with changing obstacles such as people. The robot uses coast lines in the environment which contain enough information for accurate localization and moves close to areas of the map with high information content. The robot coastal planning technique comprises the steps: (i) the information content of the environment is modeled while accounting for the sensor features and possible dynamic obstacles; (ii) trajectories are generated by the information model of the environment and obstacle information in the map. The sensors used to generate the map of the environment (the National Museum of American History) are laser range finders. Each cell of the map is characterized by information content, corresponding to the ability of the robot to localize itself. The state vector x of the robot contains the position coordinate ( x, y ) , and the direction θ . The robot acquires a range data z from a laser sensor. The method [13] generates a map of the environment that contains the information content of each robot position. The information of the robot’s current position is characterized by the difference
U = E ( H ( px/z )) − H ( px )
(9)
Figure 3. (a) Minerva robot (b) Average entropy over trajectories in the museum
between the expected entropy of the positional probability conditioned on the sensor measurements, E ( H ( px/z )) , and the entropy of the prior distribution H ( px ) .
14
L. Mihaylova *, T. Lefebvre, H. Bruyninckx, J. De Schutter
Figure 4. Robot trajectories (courtesy of W. Burgard, Univ. of Freiburg, Germany)
This term, together with a cost associated with travel, both weighted with coefficients, as in (3), are to be minimized in coastal planning. For conventional planning, only the cost term is accounted for. The cost term is formed by the probabilities of the map cells. The function to be minimized in the considered conventional planner is the cost of crossing cell ( xi , yi ) , the value of which increases when the probability that the cell is occupied is high. Fig. 4 presents the robot trajectories generated by conventional and coastal planning for the same start and goal locations. The robot trajectory shown in Fig. 4 a) is a line through the open space, whereas from Fig. 4 b) it is seen that the robot does not travel directly through the open space but moves in such a way so as to maximize the information content. The black areas show obstacles and walls, the light grey areas are areas where no information is available to the sensors. The darker grey the area, the better the information gain from the sensors. As seen, the robot is following trajectories with a lower average entropy. Fig. 3 (b) presents the average entropy as a function of the maximum range of the laser sensor. It illustrates the uncertainty reduction during the sensing process with different sensor abilities, in a static environment, for both coastal and conventional sensing. When a range sensor with an increased range (up to 10 m) is used, the results from coastal and conventional sensing are comparable and in this case the use of conventional navigation is recommended. In other outdoor conditions, coastal navigation might be more suitable. The multisine approach has a global type of planning: it generates the whole trajectory and makes use of a sequential quadratic programming method for optimization. The approach from Example 2 has a local type of planning (at each time instant the information criterion is computed and a decision for the robot movement is made) based on optimization via dynamic programming (the Vitterbi algorithm).
Active Robotic Sensing as Decision Making with Statistical Methods
5.
15
CONCLUSION
Active robotic sensing incorporates various aspects and has many applications such as active vision and autonomous robot navigation, among others. In this paper we present a probabilistic decision theoretic framework for making decisions. AS is considered as a multi-objective optimization process for determining whether the result of an action is better than the result of another. Frequently used statistical decision-making strategies are considered. Even though AS tasks are fairly different depending on the specific sensors, and applications, e.g. comparing force-controlled manipulation to that of autonomous mobile robot navigation, usually the optimality criteria are composed of two terms: a term characterizing the uncertainty minimization, i.e., maximization of information content, and a term for costs, such as traveled path or total time. Further investigations are directed toward AS tasks with one and multiple robots in a dynamic environment. AS of multiple robots requires coordination, synchronization, and collision avoidance. New solutions are needed in order to reach a reasonable balance between complexity and high performance.
ACKNOWLEDGEMENTS Financial support of the Fund for Scientific Research-Flanders (F.W.O– Vlaanderen) in Belgium, K. U. Leuven’s Concerted Research Action GOA99/04 and Center of Excellence BIS21/ICA1-2000-70016 are gratefully acknowledged.
REFERENCES 1. 2. 3. 4. 5. 6.
7.
Bajcsy, R., Real-time obstacle avoidance algorithm for visual navigation, Proc. of the 3rd Workshop on Computer Vision: Represent. and Control, pp. 55-59, 1985. Bajcsy, R., Active perception, Proc. of the IEEE, Vol. 76, pp. 996-1005, 1988. Aloimonous, Y., I. Weiss and A. Bandopadhay, Active vision, Intern. J. of Computer Vision, 1987, Vol. 1, No. 4, pp. 333-356. D. Ballard, Animate vision, Artificial Intelligence, Vol. 48, No. 1, pp. 57-86, 1991. Marchand, E., F. Chaumette, An autonomous active vision system for complete and accurate 3D scene reconstruction, Int. J. of Comp. Vision, pp. 171-194, 1999. De Geeter, J., J. De Schutter, H. Bruyninckx, H. V. Brussel, M. Decreton, Toleranceweighted L-optimal experiment design: a new approach to task-directed sensing, Adv. Robotics, Vol. 13, No. 4, pp. 401-416, 1999. DeSouza, G., A. Kak, Vision for mobile robot navigation: A survey, IEEE Trans. PAMI, Vol. 24, No. 2, pp. 237-267, 2002.
16 8. 9. 10.
11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23. 24. 25. 26. 27. 28. 29. 30.
31. 32. 33. 34.
L. Mihaylova *, T. Lefebvre, H. Bruyninckx, J. De Schutter Denzler, J., C. Brown, Information theoretic sensor data selection for active object recognition and state estimation, IEEE Trans. PAMI, pp. 145-157, 2002. Zhang, H., J. Ostrowski, Visual motion planning for mobile robots, IEEE Trans. on Rob. and Aut., Vol. 18, No. 2, pp. 199–207, 2002. Madsen, C., H. Christensen, A viewpoint planning strategy for determining true angles on polyhedral objects by camera alignment, IEEE Trans. PAMI, Vol. 19, No. 2, pp. 158163, 1997. Moon, I., J.Miura, Y.Shirai, On-line viewpoint and motion planning for efficient visual navigation under uncertainty, Rob. and Aut. Syst., pp.237-248, 1999. Laumond, J.-P., Robot motion planning and control, Springer-Verlag, 1998. Roy, N., W.Burgard, D.Fox, S. Thrun, Coastal navigation - mobile robot navigation with uncertainty in dynamic environments, Proc. of IEEE Int. Conf. on Rob. and Aut., 1999. Burgard, W., D. Fox, S. Thrun, Active mobile robot localization by entropy minimization, Proc. of the 2nd Euromicro Workshop on Adv. Mobile Rob., 1997. Liu, S., and L. Holloway, Active sensing policies for stochastic systems, IEEE Trans. on AC, Vol. 47, No. 2, pp. 373-377, 2002. Lim, H.-L., L. Holloway, Active sensing for uncertain systems under boundeduncertainty sensing goals, Prepr. of the 13th World Congr. of IFAC, USA, 1996. Hager, G., M. Mintz, Computational methods for task-directed sensor data fusion and sensor planning, Intern. J. Rob. Research, Vol. 10, No. 4, pp. 285–313, 1991. Kaelbling, L., M. Littman, A. Cassandra, Planning and acting in partially observable stochastic domains, Artif. Intell., Vol. 101, No. 1-2, pp. 99-134, 1998. Cassandra, A., L. Kaelbling, M. Littman, Acting optimally in partially observable stochastic domains, Proc. of the 12 Nat. Conf. on AI, pp. 1023-1028, 1994. Thrun, S., Monte Carlo POMDPs, Adv. in Neural Inf. Proc. Syst. 12, MIT Press, pp. 1064-1070, 1999. Shannon, C., A mathematical theory of communication, I and II, The Bell System Techn. Journ., Vol. 27, pp. 379–423 and pp. 623–656, 1948. Boutilier, C., T. Dean, and S. Hanks, Decision-theoretic planning: structural assumptions and computational leverage, J. of AI Res., Vol. 11, pp. 1-94, 1999. Bar-Shalom, Y., X.R. Li, Estimation and tracking: principles, techniques and software, Artech House, 1993. Julier, S., The scaled UT, Proc. of the Amer. Contr. Conf., pp. 4555-4559, 2002. Doucet, A., N. de Freitas, N. Gordon, Eds., Sequential Monte Carlo Methods in practice, Springer-Verlag, 2001. Fisher, R., On the mathematical foundations of theoretical statistics, Philosophical Trans. of the Royal Society of London - A, Vol. 222, pp. 309-368, 1922. Swevers, J., C. Ganseman, D. Bilgin, J. De Schutter, H. V. Brussel, Optimal robot excitation and identification, IEEE Trans. on AC, Vol. 13, pp. 730-740, 1997. Fedorov, V., Theory of optimal experiments, Acad. press, NY, 1972. Mihaylova, L., J. De Schutter, H.Bruyninckx, A multisine approach for trajectory optimization based on information gain, Rob. and Aut. Syst., pp. 231-243, 2003. Vázquez, P., M. Feixas, M. Sbert, W. Heidrich, Viewpoint selection using viewpoint entropy, T. Ertl, B. Girod, G.Greiner, H. Niemann, H.-P. Seidel (Eds.) Vision, Modeling, and Visualization 2001,, pp. 273-280. Kullback, S., On information and sufficiency, Ann. Math. Stat., pp. 79-86, 1951. NEOS, Argonne National Laboratory and Northwestern University, Optimization Technology Center, 2002, http://www-fp.mcs.anl.gov/otc/Guide/. Bellman R., Dynamic Programming, Princeton Univ. Press, 1957, New Jersey. Howard, R., Dynamic Programming and Markov Processes, The MIT Press, 1960.