Feb 9, 1998 - Centre for Autonomous Systems, Royal Institute of Technology, ... Department of Computer Science, Technion { Israel Institute of Technology, ...... Second International Conference on Computer Vision, Tampa, Florida, pages.
Application of Voting to Fusion of Purposive Modules: An Experimental Investigation P. Pirjanian Laboratory of Image Analysis, Aalborg University, DK-9220 Aalborg, Denmark
H. I. Christensen Centre for Autonomous Systems, Royal Institute of Technology, S-100 44 Stockholm, Sweden
J. A. Fayman Department of Computer Science, Technion { Israel Institute of Technology, 32000 Haifa, Israel
Abstract Control in behavior-based systems is distributed among a set of specialized behaviors. To achieve ecient implementation, behaviors exploit speci c assumptions about a given task and environment. Thus they become vulnerable to deviations that render these assumption invalid. Yet it is important to provide appropriate responses to unforeseen situations. We demonstrate that, using voting techniques, a model-free approach may be provided to constructing reliable behaviors from a multitude of less reliable ones. A team of complementary behaviors vote for the set of possible actions and the action which is most favored is selected for controlling the system. We conjecture that selecting actions according to this scheme can improve the probability of success. Our conjecture is investigated through two sets of experiments. In the rst, a team of obstacle avoidance behaviors vote to guide a mobile robot platform in the most appropriate direction. In the second, four object tracking modules are integrated to perform smooth pursuit with a camera head. Key words: purposive vision, behavior-based control, voting, command fusion
Preprint submitted to Elsevier Preprint
9 February 1998
1 Introduction The architectural organization of intelligent autonomous systems has undergone considerable conceptual and practical changes. We have in the past decade witnessed a shift from centralized sense-model-plan-act architectures to distributed behavior-based architectures. In computer vision the shift has gone from reconstructive vision [20] to the purposive vision paradigm [1{3] and in robotics it has gone from symbolic planning [13] to the behavior-based paradigm [6]. See [7] for a discussion of computer vision paradigms and [5] for robotics paradigms. The robotics counterparts of reconstructive vision and purposive vision are the symbolic planning and the behavior-based approaches respectively. Some general similarities can be drawn between the corresponding approaches in computer vision and robotics respectively. One common fundamental characteristic of the reconstructive vision and symbolic planning approaches is that they are usually not tied to any particular application. They propose to construct modules that can solve general problems that can have various applications. For example, in reconstructive vision, inspired by David Marr's [20] work, emphasis is on the recovery of three dimensional properties from two dimensional images using certain cues. In the recovery school it is believed that, if a module can robustly recover 3D structure then it can have applications to many other speci c problems such as obstacle avoidance. Robust solutions to such problems have yet to be discovered. Additionally, reconstruction techniques usually involve a very large amount of computation and are thus not suitable for dynamic environments. The purposive vision and behavior-based approaches respectively, propose to nd dedicated modules as solutions to problems with a speci c task in a speci c environment [1]. Thus, rather than attempting to nd speci c solutions to general problems these approaches seek general solutions to speci c problems. For example, rather than implementing obstacle avoidance as an application of a structure from motion module, in the purposive approach it is proposed to implement a dedicated (purposive) module that can enable obstacle avoidance and obstacle avoidance only. Systems built using this approach have displayed a signi cant success in the elds of computer vision and robotics which is witnessed by the annual AAAI Robot Competition [5,21]. In a behavior-based system, each behavior extracts only information required to complete a given task. The reduced computational demands, due to selective information processing, allow a behavior-based system to be responsive to run-time contingencies. However, this selective processing of sensor information can have certain drawbacks. The designers of highly specialized behaviors, exploit very speci c knowledge about the task and the environment to come 2
up with ecient and economic implementations. In [15] Horswill develops a design methodology for the specialization of visual modules. Physical, geometrical and other types of knowledge are incorporated as implicit or explicit assumptions in the algorithms. Such specialized modules or behaviors operate reliably as long as their assumptions are valid while if they are placed in environments which they are not programmed to handle, they may fail completely { due to the invalidation of their assumptions. System failure is usually associated with a certain cost and unacceptable casualties thus it is important for a system to reduce the likelihood of failures. In robotics, the environment is usually so complex that one cannot hope to foresee all possible scenarios and problems in advance. Therefore, the robot should be adaptive to unforeseen situations and provide appropriate responses to events that it has not explicitly been programmed to handle. Most approaches to this problem utilize redundant sources of information to make systems insensitive to incompleteness, noise and uncertainties. Most earlier approaches rely on various forms of models to improve the reliability of the system. In some cases the model is a probabilistic model that re ects the robot's interactions with its surroundings (see [16]) and in other cases a performance model is used to determine which system components or parameters are suitable to a given context (see [23,10]). However, the performance of these systems can be highly sensitive to the accuracy of the utilized models. In complex environments and tasks, such models are cumbersome to obtain and accuracy of the models hard to guarantee [23]. Thus it might be necessary to use schemes that do not rely on explicit models.
2 Outline In section 3, we review some of the work done in the area of uncertainty handling for construction of reliable systems or system components. In section 4, we describe a model-free approach for constructing reliable behaviors from a multitude of less reliable ones. The proposed method can be characterized as command fusion as opposed to sensor fusion. Simple voting schemes are used to combine the commands suggested by a set of redundant behaviors (i.e., behaviors with an identical task objective) to select the most appropriate command. The conjecture is that selecting commands according to this scheme can improve the probability of success for that task objective. In order to investigate the validity of this conjecture two sets of experiments are conducted. In the rst, a team of obstacle avoidance behaviors vote to guide a mobile robot platform in the most appropriate direction. In the second, four object tracking algorithms are integrated using a voting scheme to perform reliable smooth pursuit with a robotic stereo camera head. The experimental results are reported in section 5. The results unambiguously support the 3
validity of the conjecture of improved probability of success. The paper is concluded with a discussion of results in section 6.
3 Related work Multiple sensors have been used in mobile robots to enable them to operate in diverse environments ranging from in-door navigation to navigation on roadways [9]. In the eld of mobile robot navigation Elfes [12] introduced the framework of an occupancy grid as a common representation for spatial information. Information provided by a suite of sensors is fused into this representation using probabilistic sensor models and Bayesian updating. Methods based on probabilistic models might not be suitable in certain applications mainly because a priori probabilities can be impossible or cumbersome to obtain. Generally such models re ect the robot's interactions with its surroundings thus they might not be adequate in other environments. Changes in the environment and task characteristics can cause partial or complete degradation of the models and hence the system performance. Thus dynamic updating of the models might be necessary. In [25] a case-based (or context dependent) model selection is proposed. The basic idea is that a strategy is used to determine the context the robot is currently in and then select the model (or model parameters) best suited for the corresponding case. This approach requires techniques for reliable determination of the current situation/case and techniques to match the current situation with the ones stored in the case-library. This, however, is dicult for non-trivial cases. Other approaches to uncertainty handling, include error recovery techniques. In error recovery a set of dedicated modules monitor the task continuously and upon detection of errors (cognizant failures) an appropriate corrective action is invoked, which can handle unanticipated situations [14,11,26]. The Task Control Architecture (TCA) [26] facilitates a methodology for incremental development of reliable autonomous robots. One starts with a deliberative plan constructed to work correctly for foreseeable situations and then incrementally adds task-speci c monitors and exception handling strategies (reactive behaviors) that detect and handle unpredicted situations. A similar approach is advocated in [14] where reliability is achieved through conditional sequencing. The basic idea behind conditional sequencing is to make a robot follow instructions that depend on run time contingencies. When a situation is encountered that the robot fails to handle correctly, the system is augmented with a new component designed to handle that particular situation. In [23] a fault tolerance technique using redundant sets of behaviors have been investigated. In this approach the system is provided with a redundant set of behaviors to perform a task under dierent conditions. For each behavior a performance model exists and a failure is detected if the behavior performs 4
worse than expected. Upon detection of a failure a new behavior is invoked until an acceptable behavior is selected. Error detection methods usually rely on a model of the plant or process. Under conditions, where it is impossible or dicult to determine the model, it might be wiser to use techniques that do not directly rely on models. In this paper we demonstrate how voting techniques can be used to provide model-free methods of uncertainty handling. The basic idea in voting is to combine decisions from several modules to improve the probability of making correct decisions. Voting is a commonly used technique for construction of reliable hardware and software components. In [22] an overview of various classes of voting schemes is given. In [4] a probabilistic model is adapted to characterize the performance of a group of voting techniques including majority, mean, median and plurality voting. Further, in [4] it is shown that plurality voting provides the highest probability of making the correct decisions. The application of majority voting is demonstrated in pattern recognition [17] where decisions of several classi ers are combined to achieve better recognition results. This combination can be implemented using a variety of strategies including voting. Studies show that majority voting techniques are by far the simplest and yet they perform as good as more complicated techniques.
4 Fusion using voting schemes A model-free approach to uncertainty handling can avoid the potential pitfalls associated with building and utilization of models. Such a model-free approach can be provided by constructing reliable behaviors from a multitude of less reliable ones. For the most part, a given task objective can be carried out using a set of dierent strategies or behaviors. For example, one can imagine dierent implementations of obstacle avoidance behaviors based on various sensors and algorithms. Each implementation will have certain strengths and weaknesses respectively and thus dierent, probably overlapping, conditions of operation. In conjunction the behaviors may have a wider range of operation and be able to handle more situations than is possible by any of the behaviors alone. An important question is then how to invoke the behaviors to ensure appropriate handling of a given context. One way to achieve this is to select the behavior that is most suitable for a given situation. This approach, however, requires the system to 1) reliably determine the current situation, 2) have a knowledge of the suitability of each behavior and 3) to select the most suitable behavior. In other words, the systems should have a performance model for each behavior. As mentioned earlier it is dicult to obtain such models with sucient accuracy. Thus to avoid the process of model building and utilization a 5
dierent approach is proposed. Instead of selecting the sequence in which the behaviors should be invoked, one can let all behaviors share the control of the system simultaneously. Each behavior can vote for the set of possible actions and the action which is most favored can be used to control the system. In this way if a majority of the behaviors select correct control actions then the system will become insensitive to faulty behaviors. Voting is a common technique for construction of reliable hardware components in critical systems. Simplicity is a virtue in voting techniques and enables cost-eective and ecient hardware as well as software implementations. One main advantage of this approach is that no explicit model, probabilistic or otherwise, is used in the command fusion process. It is thus interesting to investigate how simple model-free voting techniques can be used to improve the reliability of behaviors. In a nutshell, the presented approach proceeds in the following steps: (1) Voting: A set of homogeneous behaviors, i.e., behaviors with the same objective, vote for alternative commands. This involves, for each behavior, some sensing and processing of data. (2) Command fusion: The votes are combined by a command fusion center. (3) Command selection: An appropriate command is selected according to a voting mechanism, e.g., plurality voting. (4) Control: The command is used to control the particular system.
4.1 Behaviors & task objectives
Control in a behavior-based system is distributed among a set of behaviors each responsible for a particular objective which is important to that particular task. For example, safe navigation to a target point may include objectives of avoiding obstacles and moving to a target point. In this case appropriate behaviors that should be implemented would include an obstacle avoidance behavior and a target following behavior. There is, thus, a clear correspondence between task objectives and system behaviors that are necessary for the task (See [24] for a discussion of multiple objective decision analysis and its relation to behavior-based system synthesis). Following this line of thought, we model behaviors as objective function estimators. Each behavior estimates for each possible control parameter (action) how well its objective is satis ed. We formalize a behavior, b, as a mapping from an action space, , to the interval [0; 1], 6
b : ! [0; 1]:
(4.1)
The mapping in (4.1) assigns to each action 2 a preference, where the most appropriate actions are assigned 1 and undesired actions are assigned 0. In the voting process, the outputs are interpreted as a behavior's votes for the set of alternative actions. Each action 2 is an N -dimensional control variable f1 ; 2 ; :::; N g 2 RN . The action space RN is constrained to the set of permissible actions. The constraints re ect physical, holonomical or any other constraints which are feasible for a given situation. For example consider an obstacle avoidance module, where the control parameters consist of translation and angular velocities (v; !) 2 R2. If we take into consideration the speed limitations of the robot's actuator then the action space will be constrained to = f(v; !)j v 2 [vmin ; vmax] ^ ! 2 [!min; !max]g: A set of functionally equivalent modules that have the same objective, will be denoted homogeneous modules. The redundancy introduced by the set of complementary behavior is exploited to improve the reliability of the system and to enable uncertainty handling. This in fact is a key issue in our approach to uncertainty handling as explained in the introduction. The outputs of the modules are combined using a voting mechanism:
: ! [0; 1]
(4.2)
and the most appropriate action chosen is
= argmaxf()j 2 g: 0
(4.3)
Figure 1 illustrates how the outputs are combined using producing a new preference over the action space. 7
b1 (Θ) Θ b2(Θ) δ(Θ)
Θ
θ’
Θ
b n(Θ) Θ
Fig. 1. Schematic overview of the voting process. The action space in this example is one-dimensional. The modules generate their votes (left) which are combined using the composition operator. The composed module is illustrated on the right.
4.2 Command fusion using voting schemes
In the literature of reliability theory numerous voting schemes have been proposed and in [22] a taxonomy is given for existing classes of voting schemes. The most general voting schemes are majority voting and m-out-of-n voting which belong to the same class of voting schemes known as weighted consensus voting. In majority voting, an action is chosen which has received more than half of the total number of votes. In m-out-of-n voting an action is selected if it receives m or more votes out of n. In their original formulations, however, weighted consensus voting schemes require that each module votes for only one action - the most appropriate one from a module's point of view. In the case of behaviors, more than one action can be appropriate at a time thus a variant of weighted consensus voting, known as approval voting, is called for. In approval voting each module or behavior is allowed to vote for any number of actions. Selecting the most appropriate action can then be based on criteria such as m-out-of-n voting etc. For example, majority voting selects actions that have received more than n=2 votes. In [4] it is shown that plurality voting has a higher probability of choosing the correct action when compared to a number of voting schemes including majority voting, median voting and mean voting. To enable a theoretical characterization of voting, however, in [4] a simpli ed probabilistic model is utilized with certain assumptions that are known not to be valid for the most part. One such fundamental assumptions in the probabilistic model is the assumption of statistical independence between the voting modules. Thus in order to verify such theoretical results it is mandatory to conduct empirical studies based on real-world experimentations. Plurality voting chooses the action that has received the maximum number 8
of votes. Using these results, we de ne the following voting scheme denoted plurality approval voting.
Definition 4.1 (Plurality Approval Voting) An approval voting scheme, : ! [0; 1], for a team of homogeneous behaviors fb1; b2 ; :::; bng, where n is the number of homogeneous modules, is de ned in the following way:
() = n1
Xn bi() i=1
(4.4)
and where the most appropriate action is selected according to equation (4.3).
Thus, plurality approval voting de ned in equation (4.4) corresponds to averaging of the behavior outputs. From the equation it is also seen that the action space, , can be continuous as well as discrete. However, in practice, due to hardware/software limitations the behaviors will be implemented over a nite discrete set of actions. To conclude this section, we can now state that the conjecture in this work is that the reliability of the system can be improved by fusion of homogeneous modules using plurality approval voting. The intuitive idea is that the probability of all the modules failing at the same time is less than the probability of any one module failing at a time. In this paper we shall experimentally investigate this conjecture in a mobile robot navigation and an active vision framework.
5 Experimental validation The goal of our experiments is to investigate the hypothesis that fusion of homogeneous modules using voting can lead to improved reliability. Two sets of experiments are reported in the following sections. In both experiments we were not interested in algorithms which provide the best possible results, we in fact show that it is not critical to have the most accurate implementation as fusion tends to enhance overall performance. Therefore, the module implementations were simple and straight forward. 9
5.1 Obstacle avoidance experiments
A mobile robot should be able to avoid obstacles reliably while moving around in its work space. Here we report experimental results that show that the proposed fusion scheme can be used to improve the reliability of obstacle avoidance. Three purposive obstacle avoidance modules are implemented and fused using the plurality approval voting. The heading direction of the robot is chosen as the control parameter and each module has the objective to drive the robot in obstacle free directions. The con guration used in the obstacle avoidance experiments is depicted in gure 2. The implemented obstacle avoider modules are:
VOA(L): a visual obstacle avoider , that uses the left camera images. VOA(L) is a highly specialized purposive module that exploits speci c task and environmental knowledge to implement an ecient algorithm for obstacle avoidance. The oor of the laboratory where the experiments are conducted is uniformly colored. Using edge enhancement techniques VOA(L) detects locations in the image with texture. The assumption is that texture is caused by non- oor objects. Further it is assumed that detected edge pixels correspond to points on the ground plane (i.e., the edge pixels are assumed to stem from objects on the oor). VOA(L) then calculates the world coordinates of these pixels by inverse perspective transformation [19]. The transformation of image points, that actually are on the ground plane will be correct, whereas the transformed position of pixels corresponding to points above the ground plane will not be correct. This, however, is not important because wrong transformations will be projected further away and the only information needed for obstacle avoidance, is the distance to the closest obstacle. VOA(R): a visual obstacle avoider, that uses the right camera images. VOA(R) and VOA(L) are both instantiations of the same algorithm but processing dierent inputs. SOA: a sonar-based obstacle avoider using a ring of 24 sonars. Based on sonar range readings provided by the sonar ring the SOA calculates the minimum distance to collision for each heading of the robot. SOA has certain assumptions about the task and the environment. Ultrasonic sensors provide distance information based on the time-of ight concept. An ultrasonic chirp is transmitted and the distance information is calculated based on the elapsed time between the chirp is transmitted and when the echo returns. The implicit assumptions here are rst that the echo does indeed return, second that multiple re ections of the ultrasonic chirp have not occurred and third that the echo is caused by the transmitted chirp generated by the same receiving sensor. These assumptions are commonly known not to be true in certain situations. First the emitted chirp may never return due to large angle of re ections, second due to multiple re10
votes
VOA(L)
Fuse
heading
VOA(R)
SOA
sensor readings
Fig. 2. Fusion of the obstacle avoidance modules to control the mobile robot platform.
C B A
Fig. 3. A segment of the laboratory used for experiments. The lled triangle represents the robot at various positions on a path from the left end of the hallway to the right end. The dashed boxes represent obstacles.
ections the calculated distance may be incorrect which may also be caused by cross-talk.
This team of behaviors introduces several types of redundancy:
spatial redundancy: placement of sensors at dierent locations algorithmic redundancy: the sonar-based behavior uses a dierent algorithm than the visual behaviors sensor redundancy: use of sonars and CCD cameras 5.1.1 Experimental setup
Four sets of experiments were conducted, each of which consisted of letting either one of the modules or the behavior team drive the robot until it stopped due to either a collision or a closed path (or a \trap"). 25 experiments were conducted for each of the modules and the behavior team corresponding to a total of 100 runs. We measured each module's success in avoiding obstacles. The criteria for a successful run were: i) collision free navigation and ii) non premature termination, i.e., the only acceptable reason for stopping was either a closed path or a trap. We calculate the measure of module reliability as the ratio of its successful runs to the total number of runs (i.e., #success=25). 11
5.1.2 Experimental results
The results are summarized in the last column of table 1 for each of the methods/behaviors listed in the rst row. From the reliability results of the experiments reported in table 1 it is evident that the behavior team has an improved reliability compared to any of the behaviors. Table 1
Reliability results. The results of each module and the module team. Reliability is calculated as the ratio of successful runs to the total number of experiments. Module # Successes # Failures Reliability SOA VOA(L) VOA(R) Voting
12 14 15 20
13 11 10 5
48.0% 56.0% 60.0% 80.0%
In order to illustrate how the modules complement each other, we have included here several plots of the behavior outputs/votes, taken from three different points along a path. During this experiment the robot was placed at the left end of the hallway depicted in gure 3 and it was driven by the behavior team to the right end of the hallway. Figure 4 shows the actual outputs generated by each of the behaviors and the behavior team at the chosen points along the path. Due to the restricted eld of view of the visual behaviors, only the votes of SOA are considered in the ranges [?90o; ?30o[ and ]30o; 90o]. At point A, VOA(L) fails to detect the obstacles on both sides of the robot. VOA(R) fails to detect the obstacles on the left side of the robot. These failures are due to their limited eld of view. Fortunately the sonars on the sides of the robot enable, SOA to detect the obstacles missed by the visual behaviors and provides evidence against sharp left or right turns. Additionally it provides further evidence in moving forward or turning 15o to the left, which is in agreement with the visual behaviors. Hence this action is selected. At point B, SOA is in partial disagreement with the visual behaviors, VOA(L) and VOA(R). The front sonars re, towards the upper wall of the hallway, at an inappropriate angle (> 15o) and thus it is not able to detect the upper wall and prefers to go straight forward. However, it detects the other walls which are parallel with it at an ideal ring direction for the sonars. The visual behaviors, on the other hand, detect the wall (facilitated by the distinct edge at its intersection with the oor) and prefer to turn right. These situations exemplify how a set of redundant behaviors can complement each other and contribute to system reliability. We do not claim that all scenarios will have a \happy ending" as in these cases. We merely say that the 12
SOA
1
VOA(L)
0 1 0.5
VOA(R)
0 1 0.5 0 1
VOTING
VOTING
VOA(R)
VOA(L)
SOA
1 0.5
0.5 0 −90 −75 −60 −45 −30 −15 0 15 30 Left Orientation [deg]
45
60
75
90
Right
0.5 0 1 0.5 0 1 0.5 0 1 0.5 0 −90 −75 −60 −45 −30 −15 0 15 30 Left Orientation [deg]
45
60
75
90
Right
Fig. 4. Tracking results and the fusion process for the sample experiment at positions A (left) and B (right). Preferences are combined using averaging, followed by normalization. Positive angles correspond to clock-wise (right) turn and negative angles correspond to counter clock-wise (left) turn.
probability of happy endings will increase. 5.2 Smooth pursuit experiments
Smooth pursuit is the visual process by which a moving object is tracked by the eyes. The goal is to keep the retinal position of the moving object in the fovea. Smooth pursuit has many applications e.g., in automatic surveillance systems where the \interesting" regions of a scene can be tracked. In this section we present experimental results that show how a reliable tracking module can be constructed from less reliable ones using the proposed voting scheme. Four modules for smooth pursuit are implemented:
Blob Tracking (BLOB): In our implementation, the input image is thresh-
olded to remove any spurious pixels, then the centroid of the above threshold pixels is computed. Due to its simplicity and suitability for real-time implementation, blob tracking has been perhaps the single most commonly used technique for motion detection in tracking systems. Numerous works such as [8] have reported systems that are able to track a black or white blob. The source of many of these systems is a moving light such as a ashlight. However, it is unable to handle multiple objects and is not useful in realistic environments as it assumes an object with hight contrast relative to the background. Image Dierencing (IDIFF): Dierencing two consecutive frames segments the scene into static and moving regions as only objects that have changed position between two consecutive images will have non zero pixel values. A problem with using image dierencing in smooth pursuit is that retinal motion of the background induced by camera movement can be mistaken as object motion unless this motion is rst subtracted out of the image. 13
Blob tracking
votes
Image differencing Fuse Edge tracking
pan tilt
Template matching
images
Fig. 5. Fusion of the smooth pursuit modules to control the robotic camera head.
Edge Tracking (SOB): Edge tracking is similar in nature to blob tracking,
however, rather than nding the centroid of the entire blob, the centroid of blob edges is found. In order to nd the edges, an edge operator is used on the input image. Edge tracking algorithms for smooth pursuit assume that the object, rather than the background, is rich with edge information. Template Matching (TM): The idea behind template matching is to detect a particular object in an image by searching for instances of a separate sub-image (template) which looks like the desired object. Correlation provides the basis of template matching. The template location which provides the maximal similarity measure is selected as the location of the object in the image. Template matching is sensitive to changes in object shape, size, orientation, and changes in image intensities. The output of each tracking module is a triangular function with the peak at the image position of the tracked object (see gure 7). The image coordinates (x; y) are chosen as control parameters for the tracking behaviors. The fusion module selects the most appropriate action ( (x; y) parameters ) and converts them to camera joint angles and drives the motors (see gure 5). The only type of redundancy provided by this team of behaviors is algorithmic redundancy. They process identical inputs thus the modules are far from statistically independent. However, they are independent as far as algorithmic assumptions are concerned. 5.2.1 Experimental setup
In each experiment, a robotic manipulator, holding the object to be tracked, eects horizontal translatory motion consisting of two motion segments: from the starting location to a location 200cm to the right and return to the starting location. In the experiments, we measured the ability of each module to track various combinations of objects and backgrounds (see gure 6). 14
Fig. 6. Scenarios used in experiments. Various combinations of objects and backgrounds provide diering complexities.
A total of 30 experiment sets were conducted using 6 scenarios, where each experiment set consisted of testing each of the four motion tracking modules plus the fusion module on a scenario. To account for variations in lighting and other conditions, each scenario was used for 5 experiment sets. 5.2.2 Experimental results
We de ne two measures for quantifying each module's ability to track: absolute error and relative error. Absolute error, shown in table 2, indicates whether a module successfully tracks during a single experiment. A module is said to track an object successfully if some part of the object is located at the image center during the entire motion sequence. The ratio of the number of 15
Table 2 (a) Absolute error. Module success and failure rates. Reliability is calculated as the ratio of successful runs to the total number of experiments. The main results are highlighted. Module # Success # Failure Reliability Blob Track 11 19 36.7% Image Di 14 16 46.7% Edge Track 12 18 40.0% Temp Match 13 17 43.3% Fusion 24 6 80.0% Table 3
Relative error. Module performance results for 30 runs. Relative error is presented
by the mean distance error and standard deviation. The statistics for the corresponding X and Y components are also listed. The main results are highlighted. Module Mean X Std.Dev X Mean Y Std.Dev Y Mean Std.Dev. Blob Track 136.1 188.9 110.0 198.1 185.9 266.4 Image Di 44.0 86.4 12.8 48.8 49.9 97.3 Edge Track 72.1 109.0 25.9 92.7 82.6 139.7 Temp Match 76.4 125.1 30.2 111.8 88.6 164.4 Fusion 17.2 24.1 5.2 25.8 19.8 34.3
successful runs to the total number of runs provides a measure of module reliability. Relative error, shown in table 3, is a measure of how well tracking is performed. As an expression for relative error we use the distance (in pixels) from the image center (where the tracked object should be in the ideal case), to a xed point on the moving object. We term this expression distance error 1 . Figure 7 contains plots of the actual outputs generated during one experiment. The plots show the outputs generated by the ve modules over 151 frames. Only the plots for the X-axis are shown. The last plot within each subplot is the output of the fusion module, which combines the votes received from the individual modules and chooses the action with the maximum vote, indicated with the dot in the gures. In frame 51 it is seen that IDIFF and TM lose track of the object while fusion tracks based on information provided by BLOB and SOB. In frame 71, TM To obtain the distance error, we recorded each motion sequence to video tape which was analyzed manually to obtain the ground truth (i.e., where the object was in the image) as a reference. 1
16
Frame nr. 1
Frame nr. 31
1 BLOB
0.5
IDIFF
0.5
SOB
0.5
TM
0.5
FUSE
0.5 1 0
1
0 1 −20 0 1 −20 0 1 −20 0 1 −20
0.5 −20 0
−20
−15
−10
−15
−10
−15
−10
−15
−10
−15
−10
−15
−10
−5
0
−5
0
−5
0
−5
0
−5
0
−5 X
0
5
5
5
5
5 5
BLOB
0.5
IDIFF
0.5
SOB
0.5
TM
0.5
FUSE
0.5 1 0
0 1
10
0 1
10
0 1
10
0 1
10
10
0.5 0
10
−20
−15
−10
−5
0
5
10
−20
−15
−10
−5
0
5
10
−20
−15
−10
−5
0
5
10
−20
−15
−10
−5
0
5
10
−20
−15
−10
−5
0
5
10
−20
−15
−10
−5 X
0
5
10
Frame No. 51
Frame No. 71
1 BLOB
1 BLOB
0.5 0 1 −100
IDIFF
−80
−60
−40
−20
0
0 1
20 IDIFF
0.5 0 1 −100
SOB
0.5
TM
0.5
FUSE
0.5 1 0
0 1 −100 0 1 −100
0.5 −100 0
−100
−80
−60
−80
−60
−80
−60
−40
−40
−40
−20
−20
−20
0
0
0
SOB
0.5
TM
0.5
FUSE
0.5 1 0
0 1
20
0 1
20
−60
−40
−20
0
20
−80
−60
−40 X
−20
0
20
−100
−80
−60
−40
−20
0
−100
−80
−60
−40
−20
0
−100
−80
−60
−40
−20
0
−100
−80
−60
−40
−20
0
−100
−80
−60
−40
−20
0
−100
−80
−60
−40
−20
0
0.5 0 1
20
−80
0.5
0.5 0
X
Frame No. 91
Frame nr. 111
1 BLOB
0.5
IDIFF
0.5
SOB
0.5
TM
0.5
FUSE
0.5 1 0
0 1 0 1 0 1 0 1
0.5 0
1
−25
−25
−25
−25
−25 −25
−20
−20
−20
−20
−20 −20
−15
−10
−15
−10
−15
−10
−15
−10
−15
−10
−15
−10
−5
−5
−5
−5
−5 −5
BLOB
0.5
IDIFF
0.5
SOB
0.5
TM
0.5
FUSE
0.5 1 0
0 1
0
0 1
0
0 1
0
0 1
0
0
0.5 0
0
−20
−15
−10
−5
0
5
−20
−15
−10
−5
0
5
−20
−15
−10
−5
0
5
−20
−15
−10
−5
0
5
−20
−15
−10
−5
0
5
−20
−15
−10 X
−5
0
5
X
Fig. 7. Sequence showing the tracking results and the fusion process. During the sequence the camera is driven by the fusion module. The plots present each module's votes for each action (here motion in the horizontal direction). These votes are combined by the fusion module (FUSE) and the best action indicated by the dot in the plots is selected.
resumes tracking, and in frame 91, IDIFF resumes tracking. These plots indicate that when run independently, if a module loses track of the object, due to some artifact in the image which confuses the algorithm, it may not have the possibility to correct itself, because losing the object eventually causes the object to leave the region of interest (fovea) or even the image. However, with fusion this problem can be corrected, as other modules may continue to drive the object into the region of interest, where the object/motion is searched for, giving a failing module the opportunity to regain tracking. This will only work if the causes for the failure of the modules are 17
disjunct, so that they rarely fail simultaneously. The plot in gure 8 illustrates the performance of the fusion module as a function of scenario/setting complexity. The gure consists of two superimposed histograms, one (solid frames) showing the distribution of the per-trial number of successful modules and the other ( lled area) the corresponding portion of the successful fusion trials. In the gure the complexity of the experimental setting decreases from left to right { the fewer modules that succeed in a setting the more complex it is. As can be seen, in three of the experiments none of the modules succeed, nonetheless the fusion module (surprisingly) succeeds in one of these three cases. Note also that in 11 of the cases only one module succeeds, but in 7 out of the 11 cases the fusion module manages to track successfully. How the fusion can succeed even though all or the majority of the individual modules fail is interesting and will be explained in the remainder of this section. 12
No. of occurrences
10
8
6
4
2
0 −1
0
1
2
3
4
5
No. of successful modules
Fig. 8. Distribution of the number of successful modules at each trial during 30 trials and relation to the number of times fusion succeeds. The lled areas correspond to the portion of times where fusion succeeds when only the given number of modules succeed in an experiment. Setting complexity decreases from left to right. 300
200
300
BLOB
200
IDIFF
Error in Y [pixel]
Error in X [pixel]
SOB 100
TM FUSE
0
−100
100
0
BLOB −100
IDIFF SOB
−200
−200
TM FUSE
−300 0
50
100
150
Frame Nr.
−300 0
50
100
150
Frame Nr.
Fig. 9. Plot of distance error for the modules during experiment. (left) Distance error in the X (horizontal) direction. (right) Distance error in the Y (vertical) direction.
As evident from gure 9, if run independently IDIFF fails to track the object 18
through the entire experiment. However, in gure 7 it is seen that IDIFF can resume tracking, when run in conjunction with the other modules. This can be explained as follows. When run independently, if a module loses track of the object, due to some artifact in the image which confuses the algorithm, it may not have the possibility to correct itself, because losing the object eventually causes the object to leave the region of interest (fovea) or even the image. However, with fusion this problem can be corrected, as other modules may continue to drive the object into the region of interest, where the object/motion is searched for, giving a failing module the opportunity to regain tracking.
6 Discussion Purposive modules in computer vision and behaviors in robotics exploit speci c knowledge about their task and environment to come up with ecient and economic implementations. Hence they are prone to failures in situations where their assumptions about the task and the environment become invalid. In this paper we have experimentally shown that reliable modules can be constructed by a careful integration of less reliable ones using simple voting schemes. The proposed voting technique does not utilize any explicit model of the system, its modules or its interaction with the environment. This has the advantage of avoiding the cumbersome process of model building and maintenance. However, there is enough evidence that shows, that using models that are accurate and correct can produce better results and more ecient use of resources. In [18] a maximum likelihood voting (MLV) strategy is proposed, where the reliability of each module is used to determine the most likely result, under assumption of statistical independence. It is shown that MLV strategies have better performance than consensus voting schemes. Thus if accurate models (statistical or other) are available or easy to obtain they can be used with advantage. An additional, and in our opinion more important, advantage of such models is that they lend themselves to analytical characterization and thus prediction of system performance. However, theoretical results on voting algorithms [4], are often based on simpli ed probabilistic models with certain assumptions that are known not to be valid for the most part. Thus in order to verify such theoretical results it is essential to conduct empirical studies similar to the one presented in this paper. 19
Acknowledgments This work is supported in part by the SMART-II project, funded through the EU Training and Mobility in Research (TMR) programme. The EC-IS Collaboration has in part funded this project including transportation expenses. Some of the experiments presented in this work were conducted at the Intelligent Systems Laboratory, Israel Institute of Technology. Thanks to Dr. E. Rivlin for providing the equipment and other lab. facilities. We would like to thank Prof. E. Granum, Laboratory of Image Analysis, Aalborg University, Denmark for his support and encouragement of this work. Special thanks are due to the reviewers for providing us with useful comments.
References [1] John (Yiannis) Aloimonos. Purposive and Qualitative Active Vision. Image Understanding, Pittsburgh, Morgan Kaufmann Publishers, pages 816{828, 1990. [2] John (Yiannis) Aloimonos et al. Integration of Visual Modules. Academic Press, Inc., 1250 Sixth Avenue, San Diego, CA 82101, 1989. [3] Dana H. Ballard and Christopher M. Brown. Principles of Animate Vision. CVGIP: Image Understanding, 56(1):3{21, July 1992. [4] Douglas M. Blough and Gregory F. Sullivan. A Comparison of Voting Strategies for Fault-Tolerant Distributed Systems. In 9th Symposium on Reliable Distributed Systems, pages 136 { 145, October 1990. [5] Pete Bonasso and Thomas Dean. A Retrospective of the AAAI Robot Competitions. AI Magazine, 18(1):11 { 23, Spring 1997. [6] Rodney A. Brooks. A Robust Layered Control System for a Mobile Robot. IEEE Journal of Robotics and Automation, 2(1):14{23, March 1986. [7] Henrik I. Christensen and Claus B. Madsen. Purposive Reconstruction, Reply to: A Computational and Evolutionary Perspective on the Role of Representation in Vision by M.J. Tarr and M.J. Black. Laboratory of Image Analysis, Aalborg University, Denmark, November 1992. [8] J. J. Clark and N.J. Ferrier. Modal control of an attentive vision system. In Second International Conference on Computer Vision, Tampa, Florida, pages 514 { 523, December 1988. [9] S. Cooper and Hugh F. Durrant-Whyte. A Frequency Response Method for Multi-Sensor High-Speed Navigation Systems. Proceedings of the 1994 IEEE Int. Conf. on Multisensor Fusion and Integration for Intelligent Systems, pages 1{8, October 1994.
20
[10] James L. Crowley and Francois Berard. Multi-Modal Tracking of Faces for Video Communication. In Computer Vision and Pattern Recognition, pages 640 { 645. IEEE Computer Society, June 1997. [11] Bruce R. Donald. Error Detection and Recovery in Robotics. Springer-Verlag, 1989. [12] Alberto Elfes. Multi-source Spatial Data Fusion Using Bayesian Reasoning. Data Fusion in Robotics and Machine Intelligence, eds. Abidi and Gonzalez, pages 137{163, 1992. [13] Richard E. Fikes and Nils J. Nilsson. Strips: A new approach to the application of theorem proving to problem solving. Arti cial Intelligence, (2):189 { 203, 1971. [14] Erann Gat and Greg Dorais. Robot Navigation by Conditional Sequencing. IEEE Int. Conf. on Robotics and Automation, 2:1293{1299, 1994. [15] Ian D. Horswill. Specialization of Perceptual Processes. PhD thesis, Department of Electrical Engineering and Computer Science, MIT, 1993. [16] Steen Kristensen. Sensor planning with Bayesian decision theory. In Reasoning with Uncertainty in Robotics. University of Amsterdam, The Netherlands, December 1995. [17] Louisa Lam and Ching Y. Suen. Application of Majority Voting to Pattern Recognition: An Analysis of Its Behavior and Performance. IEEE Transactions of Systems, Man, and Cybernetics { Part A: Systems And Humans, 27(5):553 { 568, September 1997. [18] Yin-Wing Leung. Maximum Likelihood Voting for Fault-Tolerant Software with Finite Output-Space. IEEE Transactions on Reliability, 44(3):419{427, September 1995. [19] Hanspeter A. Mallot et al. Inverse perspective mapping simpli es optical ow computation and obstacle detection. Biological Cybernetics, (64):177{185, 1991. [20] David Marr. Vision: A Computational Investigation into the Human Representation and Processing of Visual Information. W.H. Freeman and Company, 1982. [21] Illah Nourbakhsh, S. Morse, et al. The Winning Robots from the 1993 Robot Competition. AI Magazine, 4(14):51{66, Winter 1993. [22] Behrooz Parhami. Voting Algorithms. IEEE Transactions on Reliability, 43(3):617{629, December 1994. [23] David Payton, David M. Keirsey, David Kimple, et al. Do Whatever Works: A Robust Approach To Fault-Tolerant Autonomous Control. Journal of Applied Intelligence, 3:226 { 250, 1992.
21
[24] Paolo Pirjanian and Henrik I. Christensen. Behavior Coordination using Multiple-Objective Decision Making. In Paul S. Schenker and Gerard T. McKee, editors, Sensor Fusion and Decentralized Control in Autonomous Robotic Systems, volume 3209, pages 78 { 89. SPIE - The International Society for Optical Engineering, October 1997. [25] Ashwin Ram, Ronald C. Arkin, et al. Case-based reactive navigation: A method for on-line selection and adaptation of reactive robotic control parameters. IEEE Transactions on Systems, Man, and Cybernetics, 27(3):376 { 394, June 1997. [26] Reid G. Simmons. Structured Control for Autonomous Robots. IEEE Trans. on Robotics and Automation, 10(1):34{43, February 1994.
22