Exploiting Redundancy of Purposive Modules in the Context of Active Vision Jerey A. Fayman 1 Paolo Pirjanian2 Henrik I. Christensen2 Ehud Rivlin1
1 Department
of Computer Science Israel Institute of Technology { Technion Haifa 32000, Israel fjef,
[email protected] 2 Laboratory of
Image Analysis, Aalborg University Aalborg East, Denmark fpaolo,
[email protected]
Abstract It is becoming increasingly clear that the robustness demonstrated by biological vision systems is in large part due to their ability to actively integrate (fuse) 1 information from a number of similar or diering visual cues [34, 2]. In addition to active integration, the perception{action nature of biological systems relies upon the ability to eectively compose behaviors in an event driven manner. Providing mechanical vision systems with capabilities similar to those possessed by biological systems therefore entails developing appropriate tools and techniques for eective integration and behavior composition. In this paper, we address two issues. First, we present a uni ed approach for handling both integration and process composition which combines a theoretical framework which provides a uniform description of teams of behaviors that are committed to achieving a speci c goal through common eort and a well known process composition model. Our approach is implemented in a working system called the AV-shell which provides the ability to specify and implement tasks which make use of integration and process composition according to our approach. Secondly, we address the issue of integration in the active vision activity of smooth pursuit. We have experimented with the fusion of four techniques for implementing smooth pursuit such as template matching and image dierencing. We discuss each technique focusing on strengths and weaknesses, then show that fusing the techniques according to our formal framework improves tracking. This work was supported in part by the European Community { Israel collaboration ECIS{003 project
1
In this work, the terms active integration and fusion are used interchangably
1 Introduction In the past several years, research into the bene ts and advantages of movable visual sensory systems (as possessed by biological systems) over passive systems have been explored in the area of active vision. It has been shown that a moving system leads to improved robustness and the elimination of ill-posed conditions in several computer vision problems [3, 5, 6]. While much of the research in active vision has focused on developing individual modules for implementing behaviors such as xation and smooth pursuit, it is becoming increasingly clear that a robust vision system should make use of a seamless integration of a number of functionally equivalent (homogeneous) or nonequivalent (non-homogeneous) modules [34]. Indeed, integration forms the basis of a recently proposed extension of the Marr paradigm [2] in which the authors stress that future progress in computer vision will be a result of module integration: \Now that most of the modules have been studied in isolation, we think that it is time that they be tested in pairs, triples, and so on". The advantages of integration are evident in biological systems where visual eectiveness lies in exploiting information from a variety of mechanisms, fusing the information to take advantage of strengths while avoiding the weaknesses of particular mechanisms under varying conditions [12]. While integration provides us the ability to robustly implement individual modules in a vision system, the perception-action nature of active vision requires that these modules be exibly composed in an event driven manner. This requires powerful yet exible process composition tools [40]. In this paper, we address two issues. First, we present a uni ed approach for handling both integration and process composition. The approach integrates a theoretical framework which provides a uniform description of teams of behaviors that are committed to achieving a speci c goal through common eort and a well known process composition model. Our approach is implemented in a working system called the AV-shell which provides the ability to easily integrate redundant multi-valued modules according to our approach and compose processes into complex perception-action tasks.
Secondly, we address the issue of integration in the active vision activity of smooth pursuit. While a variety of techniques exist to implement smooth pursuit, each of these techniques has strengths and weaknesses making it robust under some conditions and weak under others. The ability to fuse these techniques such that we can exploit their strengths and avoid their weaknesses naturally leads to more robust system behavior. We have experimented with the fusion of four techniques for implementing smooth pursuit such as template matching and image dierence based tracking. We discuss each technique focusing on strengths and weaknesses, and then show that fusing the techniques according to our formal framework improves tracking. The remainder of the paper is organized as follows. In section 2, we review related work. In section 3 we introduce the approach we have adopted for handling module composition. Next in section 4 we discuss the theoretical framework we have adopted for the fusion of redundant multi-valued behaviors. Section 5 shows how we have uni ed the module composition approach of section 3 and the module integration formalism of section 4. In section 6 we present our architecture and working system which allows us to implement tasks that exploit redundancy. In section 7, we discuss the active vision activity of smooth pursuit and focus on four motion estimation techniques for its implementation. These will be used in section 8, where we present experiments in fusing the results of the motion estimation techniques leading to more robust smooth pursuit behavior. Finally, we conclude with section 9.
2 Related Work Since 1985, when active vision rst began to appear in the literature [3, 5, 6], the topic has received a dramatic increase in interest. Initial work focused on building active vision devices and understanding and transfering to these devices capabilities possessed by biological vision systems such as saccades [9], smooth pursuit [15, 9], xation [33], attention [45], and prediction [11]. While several of these works
point out that many times, these capabilities are composed of various cues (e.g. disparity and accommodation in vergence [33, 13]), to the best of our knowledge, no previous work exists in which bene ts of the fusion of a number of homogeneous modules are exploited to improve individual visual capabilities. Uhlin et al. point out [42] that in order to operate in a rich environment, a vision system should be capable of using and integrating many cues. Several researchers have explored the area of module integration in computer vision. Pankanti et al. [34] integrate modules for perceptual organization, shape from shading, stereo and line labeling and show the improved performance of the integrated system. Integration forms the basis of a proposed extension of the Marr paradigm [2]. We too recognize the importance of module integration and present a system providing these capabilities for homogeneous modules. Several active vision systems have been developed, however, capabilities for the fusion of homogeneous modules are not explicitly provided by these systems. In [16], Crowley and Christensen discuss a sophisticated architecture and system called VAP-SAVA for the integration and control of a real-time active vision system. The VAP-SAVA system was designed speci cally for the purpose of demonstrating continuous operation in context of integration. It possesses many of the features required in a system for active vision experiments, however it is desirable to have programming facilities that enable exible composition and integration of processes to enable a variety of architectures and applications. In [23], Firby et al. present an \Animate Agent Architecture". In their system, visual routines, which are composed of combinations of primitive visual operators, are paired with action routines. These pairs, called \reactive skills" are designed to run in tight perception-action cycles to meet real-time constraints. The system proposed by Firby et al. possesses many of the features required in a system for active vision experiments. Our work diers in that we provide a portable system including a language for exploiting integration and specifying process composition.
Process composition by representing task/plans as networks of processes was rst proposed by Lyons et al. [30, 31, 29] where the Robot Schema model is discussed. Kosecka et al. [26] adopt Robot Schemas and show how one can synthesize a nite state machine supervisor which serves as a discrete event controller. Elementary behaviors appropriate in the domain of an \intelligent delivery agent" are described and experiments in robot navigation are presented. Our work also makes use of the Robot Schemas model, however, we recognize the importance of module fusion to the robust behavior of active vision systems and extend the power of robot schemas to encompass both module integration and module composition in a uni ed manner. Reliability in robotics has been studied along several main paths: reactivity, error recovery and uncertainty handling 2 .
Reactive behavior-based systems provide immediate responses to unpredictable environmental changes through a tight coupling of perception and action. Reactive architectures have shown improved reliability when compared with classical sense-plan-act architectures. The behavior-based approach has several advantages over the classical methods of creating autonomous mobile robots. In behaviorbased systems, intelligent behavior is achieved through combining a set of behaviors, each connecting perception to action concurrently and in a parallel fashion. Basic reactions to stimuli are combined to generate resultant (emerged) behaviors. Representative of such systems is Brooks' subsumption architecture [10], in which the robot architecture consists of a set of hardwired behavioral (reactive) modules. The reduced computational demands allows a behavior-based system to operate in realtime and in changing environments. However, the behavior-based approach has its own limitations. The interaction and possible couplings between behaviors are unknown a priori, but they may be crucial to the overall stability of the system. The coupling problem worsens as the number of the behaviors increases, which is known as the scaling problem. It is shown that a combinatorial explosion Uncertainty handling can be further divided into the two related areas of explicit uncertainty handling and methods exploiting redundancy (such as sensor fusion). 2
in the number of possible behaviors can occur when composing behaviors [4, 41]. Another important limitation of behavior-based systems is their lack of explicit goals and goal-handling capabilities. In error recovery techniques a set of dedicated modules monitor the task continuously and upon detection of an error (cognizant failure) an appropriate corrective action is invoked, which can handle unanticipated situations. Error detection and recovery was rst introduced to robotics in [17]. The Task Control Architecture (TCA) [39] facilitates a methodology for incremental development of reliable autonomous agents. One starts with a deliberative plan constructed to work correctly for foreseeable situations and then incrementally adds task-speci c monitors and exception handling strategies (reactive behaviors) that detect and handle unpredicted situations. A similar approach is advocated in [24] where reliability is achieved through conditional sequencing. The basic idea behind conditional sequencing is to make a robot follow instructions that depend on run time contingencies which occur through the interaction of the robot with its surroundings. Both approaches have displayed signi cant success in improving the overall reliability of robots concerning situation handling capabilities. I.e., they provide a framework for a systematic construction and organization of situation handling strategies (primitive behaviors). However, the reliability of such a system depends heavily on the reliability of the primitive behaviors or context-dependent monitors, since the failure of these to handle the situations they are conditioned for (due to failure of e.g. hardware, algorithm etc.) can cause a failure of the whole task/system. These behaviors should have uncertainty handling capabilities and be able to cope with con icting and/or incomplete information. Thus the reliability of the primitive behaviors themselves is highly critical and should be characterized. Even though these methods are supported by a speci c architecture and utilities it is quite ad hoc and heuristic, which makes it of less practical use. Our contribution is to provide a formal approach for constructing reliable behaviors from less reliable ones. The basic idea behind uncertainty handling techniques is to have explicit \models" of the robot's
sensing and action capabilities. Using this model the agent (robot) can predict what actions to take in order to increase the expected utility or the reliability of its task. Even various combinations such as reactivity and uncertainty handling has been exploited to an extent. The usual approach is to combine (high-level) deliberative planners that have uncertainty handling capabilities with (lowlevel) reactive modules (behaviors) which handle run-time contingencies [37, 27]. The most popular approaches for reasoning under uncertainty are based on probability theory [36], Dempster-Shafer theory [38], fuzzy set theory [46] and certainty factor formalism. Methods that exploit redundancy are usually based on data/sensor fusion techniques [1, 19], where reliability is improved by pooling evidence. Sensor integration/fusion has the potential of reducing overall uncertainty, overcoming sensor imperfections and producing more reliable results. In [1] image segmentation is performed in a framework in which fuzzy set theory is adopted for sensor fusion, where uncertainty is modeled using membership functions. A major drawback when using fuzzy set theory is that the estimation of the fuzzy sets or membership functions can be cumbersome if no clear guidelines can be found. Elfes [20] introduced the framework of an occupancy grid as a common representation for spatial information. Information provided by a suite of sensors is fused into this representation using probabilistic sensor models and Bayesian reasoning. These approaches are centralized in the sense that data are fused in a data fusion center. In [32] a decentralized approach to data fusion is introduced. This approach is based on Bayesian reasoning and utility theory. The Bayesian methods, however, might not be suitable in certain applications mainly due to two reasons. First a priori probabilities can be impossible or cumbersome to obtain. Second in many cases you will need to express ignorance with regard to speci c choices. Under these conditions other techniques are called for. Dempster-Shafer theory is suitable for handling ignorance and does not need a priori probabilities, however, in the general case Dempster's rule of combination has an exponential computational complexity. In speci c
cases, however, this can be circumvented and tractable implementations are possible. See [25, 28] for application of Dempster-Shafer theory for sensor fusion. In this paper we advocate a scheme of handling uncertainties which potentially can be based on any of the abovementioned methods.
3 Module Integration and Composition Module integration and composition can be viewed as belonging to two dierent domains: quality and time respectively. Additionally, we can distinguish two forms of integration: integration of functionally equivalent (homogeneous) modules and integration of functionally nonequivalent (nonhomogeneous) modules (i.e. sensor fusion). Our system, the AV-shell (presented in section 6) adopts the notation of the well known Robot Schemas (RS) model which proposes an elegant solution to the problem of module composition. However, RS does not address integration in an explicit manner. Our goal is to provide a uniform method for specifying both module composition and integration. We believe that a solution in which the RS notation is augmented with a fusion operator provides us the most elegant solution for achieving our goal. In this section we will discuss the RS model. In the next section, we present the formalism for process integration and then in section 5, we show how we have extended the RS model to incorporate integration in an explicit manner. 3.1 Robot schemas
The AV-shell provides an elegant process algebra proposed in [29, 30] called the Robot Schemas model. RS provides a language for specifying process concurrency and captures the temporal and structural dependencies required to implement complex perception tasks. Table 1 summarizes the RS composition operators.
3
In the RS model, communication channels between concurrent processes
are called \ports". Messages are written to, and read from ports. A port to port connection relation Notation in table 1 is consistent with those of [29, 30], however, in our implementation, the operators were changed to avoid con icts with existing operators. 3
can be speci ed as an optional third parameter in concurrent composition. This connection relation speci es a set of couples op 7! ip indicating that port ip and op are connected. 1. Sequential Composition: T = P ; Q. The process T behaves like the process P until that terminates, and then behaves like the process Q (regardless of P 's termination status). 2. Concurrent Composition: T = (P jQ) . The process T behaves like P and Q running in parallel and with the input ports of one connected to the output ports of the other as indicated by the port-to-port connection map c. This can also be written as T = (j 2 P ) for a set of processes indexed by I . 3. Conditional Composition: T = P hvi : Q . The process T behaves like the process P until that terminates. If P aborts, then T aborts. If P terminates normally, then the value v calculated by P is used to initialize the process Q, and T then behaves like Q . 4. Disabling Composition: T = P #Q. The process T behaves like the concurrent composition of P and Q until either terminates, then the other is aborted and T terminates. At most one process can stop; the remainder are aborted. 5. Synchronous Recurrent Composition: T = P hvi :; Q . This is recursively de ned as P :; Q = P : (Q; P :; Q). 6. Asynchronous Recurrent Composition: T = P hvi :: Q . This is recursively de ned as P :: Q = P : (Qj (P :: Q)). c
c
i
I
i
v
v
v
v
Table 1: Summary of RS Composition Operators Using RS notation, we can readily see how high-level active vision activities such as xation and pursuit can be easily speci ed; Fixation is initialized with a saccade to the xation point followed by continuous vergence control driven by disparity and accommodation cues. fixation =
saccade; ((disparity | accommodation) :; (vergence control))
In pursuit, vergence, foveal motion detection and dynamic accommodation are used to continuously drive motion of the vision sensor. pursuit = (vergence # foveal motion # accommodation) :; move.
The previous discussion of process composition applies to robotic processes as well. For example, a
task which integrates an active vision device with a manipulator for repeatedly locating and grasping parts from a moving conveyor can be speci ed as follows: capture = (find part : fixate) :; ((pursuit | locate part) : grasp part).
In the example, we rst nd the part on the conveyor, then xate on it. Once xated, we pursue it with the active vision device which provides data to the arm for locating the part. Finally we grasp the object. The purpose of the example is to illustrate the expressive power of the RS composition operators rather than how each primitive process is implemented. Composition of primitive routines into complex activities is achieved by traversing an L-R parse tree built from an expression in RS notation. For instance, pursuit is represented by the parse tree shown in Figure 1. Pursuit
:; Move
# Vergence Control
#
Foveal Motion Detection
Dynamic Accommodation
Figure 1: Pursuit Parse Tree
4 Theoretical Framework for Redundancy In the following section we formalize a behavior as a mapping that assigns preferences to each possible action. From a behavior's point of view, the action with the highest preference can best ful ll its goal or goals. Then we de ne teams of behaviors that are committed to achieving a speci c goal through common eort, using voting. Consequently a formal framework is presented that can allow us derive equations for the reliability of behaviors as well as behavior teams. These theoretical ndings are
then used to draw guidelines that can help us design teams of behaviors with improved reliability. 4.1 Behaviors and behavior teams
We formalize a behavior, b, as a mapping from an action space, , to the interval [0; 1]:
b : ! [0; 1]:
(4.1)
The action space, = f1 ; 2; :::; g, is de ned to be a nite set of possible actions. The mapping n
assigns to each action 2 a preference, where the most appropriate actions are assigned 1 and undesired/illegal actions are assigned 0. Consider having a set of behaviors, B, all providing the same functionality, i.e., all pursuing the same goal, e.g., obstacle avoidance, grasping or door traversal. A set of functionally equivalent behaviors, that pursue the same common goal, will be denoted homogeneous behaviors4. A set of homogeneous behaviors that in combination pursue a speci c goal
will be denoted a behavior team, BT , characterized by a tuple:
BT = (B; ; );
(4.2)
where
B = fb1 ; b2 ; :::; b g is a set of homogeneous behaviors, = f1 ; 2 ; :::; g is an action space common to all b 2 B, : ! [0; 1] is an operator for composition of the (outputs of the) behaviors, B. n
k
The outputs of the behaviors are combined using and the most appropriate action chosen is 0 where
(0) = maxf()j 2 g. Figure 2 illustrates how the outputs are combined using the composition operator producing a new preference over the action space. Apart from having the same goal these behaviors can be based on dierent sensing modalities, algorithms etc. to achieve the same goal. 4
b1 (Θ) Θ b2(Θ) δ(Θ) Θ Θ
θ’
b n(Θ) Θ
Figure 2: Schematic of the composition process in a behavior team. The behaviors generate their votes (left) which are combined using the composition operator. The composed behavior is illustrated on the right. 4.2 Composition operators
Various composition operators have been suggested and studied in the literature [8]. This work is only concerned with voting schemes, which de ne a class of composition operators that are computationally ecient (see [35] for a survey). A general de nition for a class of voting schemes, known as weighted consensus voting, is given by Definition 4.1 (m-out-of-n Voting) A m-out-of-n voting scheme, V : ! [0; 1], where n is
the number of homogeneous behaviors, is de ned in the following way:
V () =
where
v () =
P
?(b1 (); :::; b ()) if =1 v () m; 0 otherwise n
n
i
1 if b () > 0; for i = 1; :::; n: 0 otherwise i
i
i
(4.3)
(4.4)
is the voting function and ? : [0; 1] ! [0; 1] is a function for combining the preferences of the n
behaviors for a given action.
A behavior gives a vote for an action if its preference for that speci c action is > 0. If m behaviors vote for an action, , then their preferences are combined using ?, which can be a simple addition of the preferences or more sophisticated functions. Our design objective, outlined later, is to choose a voting scheme, , that satis es the following requirement. Conjecture 4.1
A behavior team, BT = (B; ; ), is at least as reliable as the most reliable behavior, b 2 B of the behavior team, i.e., Rel((BT )) > maxfRel(b)jb 2 Bg. Where Rel(b) is a measure of reliability de ned as the portion of successful runs of a behavior, b, or behavior team, BT . The requirement formalizes the hypothesis that the reliability of the system can be improved by fusing redundant behaviors in an appropriate way. Our objective is to investigate the conditions under which this requirement is ful lled both theoretically and in practice. The theoretical investigation calls for a de nition and characterization of reliability of behaviors and behavior teams. 4.3 Behavior and behavior team reliability
Knowing the ground truth, i.e., the correct decision under given circumstances we can classify the outcome of each cycle of a behavior as either a \success" or as a \failure". The behavior is successful in a perception-action cycle if it makes the correct decision and it fails if it makes the wrong decision. Thus the probability of success is de ned as
P fsuccessg = P fo = posjs = posgP (pos) + P fo = negjs = neggP (neg); where 'o' is the output of the behavior and 's' is the actual state. We then de ne the reliability,
p, of a behavior as its probability of success. The outcome of a behavior can be described by a Bernoulli random variable, X , in the following way. If we let X = 1 when the outcome of a behavior is successful and X = 0 if it is a failure, then the probability mass function of X is given by:
P fX = 1g = p P fX = 0g = 1 ? p;
(4.5)
where p is the reliability of the behavior in question. Now consider the behaviors B of a behavior team as a system of n = jBj components, each of which has either a successful outcome or it fails. Then letting
x= i
1 if behavior i is successful; 0 otherwise
(4.6)
we call x = (x1 ; x2; :::; x ) the state vector. Then we de ne a function (x1 ; x2 ; :::; x ) such that n
n
(x1 ; x2 ; :::; x ) = n
1 if BT is successful understate vector (x1 ; x2; :::; x ); 0 otherwise n
(4.7)
The function (x1 ; x2 ; :::; x ) is called the structure function. Suppose now that the outcome of the n
behaviors b1 ; b2 ; :::; b are given by a set of Bernoulli random variables, X ; i = 1; :::; n as described in n
i
Equation (4.5). Then let
r(p1; :::; p ) = P f(X1; :::; X ) = 1g: n
(4.8)
n
This function represents the probability that the behavior team is successful (i.e., the reliability of the behavior team) when the behaviors of the team have reliabilities p ; i = 1; :::; n. Equation (4.8) i
is called the reliability function. The reliability of a behavior team depends on the reliability of the individual behaviors, the number of behaviors, jB j, their statistical dependence and the criteria for selecting the appropriate action. The latter part is determined by the voting scheme, whereas the others are either given parameters or known/assumed properties in speci c situations. Thus to
calculate the reliability function we need the structure function which depends on the voting scheme. 4.4 k-out-of-n systems
The general expression for the reliability can be facilitated by the following de nition. Definition 4.2 (k-out-of-n System) A system of n behaviors is called a k-out-of-n system if it
is successful only when at least k out of its n behaviors are successful.
A m-out-of-n voting scheme is related to a k-out-of-n system by the relation k = n ? m + 1. E.g., a
n-out-of-n voting scheme de nes a 1-out-of-n system. The structure function of a k-out-of-n5 system is given by (x1 ; x2 ; :::; x ) =
P
1 if =1 x k; 0 otherwise n i
n
i
(4.9)
The reliability function of a k-out-of-n system can then be calculated by
r(p1; :::; p ) = P fX = 1 for at least k behaviors i = 1; :::; ng n
i
=
X X n
l
= x2 ( ) k
(1(x); :::; (x)); for k = 1; :::; n n
(4.10)
l;n
where (x) is the probability that the outcome is x 2 (k; n), (k; n) is the set of all the permutations of (the outcome of) n behaviors where k of them are successful, 1, and n ? k are not, 0. For example
(2; 3) = f(1; 1; 0); (1; 0; 1); (0; 1; 1)g. The function : f0; 1g ! [0; 1] is de ned in the following n
way (x) = i
p if x(i) = 1; 1 ? p otherwise i
i
(4.11)
Note that in the reliability literature [18] a distinction is made between a k-out-of-n:G and a k-out-of-n:F systems. Our de nition is equivalent to the rst thus when we write k-out-of-n system we refer to a k-out-of-n:G system. 5
To investigate under which conditions Requirement 4.1 is satis ed the following inequality has to be solved
X X n
l
= x2 ( ) k
(1 (x); :::; (x)) > maxfp1; :::; p g for k = 1; :::; n n
n
(4.12)
l;n
Solution to this inequality is, however, conditioned on knowledge about the statistical dependence between the behaviors and the reliability of the behaviors p1; :::; p . If the behaviors are statistically n
independent then Equation (4.10) has the following form
r(p1; :::; p ) =
X X Y n
n
n
l
= x2 ( ) =1 k
l;n
(x); for k = 1; :::; n: i
(4.13)
i
Figure 3 is a plot of Equation (4.13) as a function of the mean value of the reliability of the behaviors. The plot is obtained by a simulation of the reliability function, where the reliabilities of each behavior is generated randomly (uniformly distributed) 10000 times. The solid lines in the gure are plots of a special case, where all the behaviors have the same reliability (i.e., standard deviation of zero). This case is the cumulative binomial distribution. It is clear from the gure that Requirement 4.1 is not universally ful lled. However, it seems that the reliability increases as a function of the mean of the reliabilities of the behaviors. Further the reliability of a k-out-of-5 system is greater than or equal to the reliability of a k + 1-out-of-5 system, for k = 1; 2; 3; 4. This property can be shown to be valid in the general case. An analytical solution can not be given to Equation (4.12) because it is ill-conditioned in its general form. In speci c cases, however, it might be possible to solve the equation. The following theory which applies to the general form of Equation (4.10) can help us draw design guidelines for choosing an appropriate system. Theorem 4.1
For a given behavior team, the reliability of a k-out-of-n system is greater than or equal to the reliability of a k + 1-out-of-n system. Formally
r(k) r(k + 1) ; for k = 1; :::; n where for convenience we denote the reliability function of a k-out-of-n system r(k).
(4.14)
1 0.9 0.8
r(p_1,...,p_5)
0.7 0.6 0.5
k=1
k=2
k=3
k=4
k=5
0.4 0.3 0.2 0.1 0 0
0.1
0.2
0.3
0.4 0.5 0.6 mean(p_1,...,p_5)
0.7
0.8
0.9
1
Figure 3: Plot of r(p1; :::; p5 ) against random generated values for p1; :::; p5 for k = 1; :::; 5 from left to right . Theory 4.1 says that that system reliability increases as k decreases. Proof 4.1 (of Theorem 4.1)
Equation (4.10) can be written recursively
r(k) = r(k + 1) + Since
P
x2 (k;n)
X x2 (k;n)
(1 (x); :::; (x)) n
(4.15)
(1 (x); :::; (x)) 0 it is obvious from Equation (4.15) that r(k) r(k + 1). n
4.5 Design guidelines
By Theorem 4.1 the reliability of a k-out-of-n system is greater than or equal to the reliability of a
k +1-out-of-n system, which inductively implies that a 1-out-of-n system will maximize the reliability function. It is, however, important to note that the choice of a voting scheme should not only depend on maximizing the reliability function but also on other properties such as stability. Definition 4.3 (Stability) A system is said to be stable if a small error in an observation only
causes a small error in the nal conclusion.
It can be shown that system stability decreases as k decreases. In conclusion the reliability function increases and the stability decreases as the k value decreases. Thus it is clear that in choosing an appropriate voting scheme a trade-o must be made between increasing the reliability and increasing
the stability of the system. Hence, given a set of behaviors fb1 ; :::; b g with reliabilities p1 ; :::; p and n
n
a minimum desired value of reliability rmin a procedure for choosing a voting scheme could be: 1. If rmin > max r(p1; :::; p ) goto step 5. 2. Choose a k-out-of-n system de ned by a m-out-of-n voting scheme, where k = n ? m + 1, with the lowest k that ful lls equation (4.12). Goto step 4. 3. Increase k by 1. 4. If r(p1; :::; p ) < rmin goto step 3 else goto step 5. 5. Stop. n
n
Step 1 checks if it is at all possible to achieve the minimum required reliability rmin for the given set of behaviors. If not the reliability can be improved by either adding a new behavior or improving the reliability of the behaviors already in the team or both. Step 2 chooses a voting scheme that satis es Requirement 4.1 for highest possible stability. If this case does not ful ll one's minimum requirement for reliability (step 4) then step 3 trades o stability for higher reliability.
5 Incorporating Redundancy The RS notation of table 1 is based on the composition of individual processes which are de ned as a \unique locus of computation" or basic schemas such as a piece of hardware or physical agent of change. We are not able to describe or analyze behavior below the level of a basic schema. We augment the RS model to include fusion capabilities with the addition of the integrating com-
position operator fuse which extends the de nition of a schema to include the fused output of a set of modules. Although the integrating composition operator is applicable to a homogeneous as well as non-homogeneous set of modules, in this work, we deal only with homogeneous modules. Given a set of homogeneous modules b , we de ne the behavior of the schema B resulting from integrating i
composition as follows:
B = fuse (b1 ; b2; : : : ; b ) n
A particular execution environment may not enable concurrent execution of processes in which case the processes will be executed sequentially. As this is an implementation issue, it is transparent to the user. The formal description of the operator fuse is left unspeci ed as there are various ways to integrate modules such as consensus voting.
6
Note that we have not modi ed the existing RS
operators in any way, we have only extended the de nition of a basic schema in a well de ned manner. The formalism presented in section 4 shows that the reliability of schema B will be at least as good as the reliability of any of the b . As a result of this extension of the de nition of a schema, we retain i
all of the analytic power of the RS model and provide module integration bene ts of section 4. To give an example of how the new operator is used, we will rewrite the de nition of pursuit in section 3: pursuit = (vergence # foveal motion # accommodation) :; move.
to include module integration. Pursuit can now be de ned as follows:
pursuit = (vergence # fuse(blob, idiff, edge, sob) # accommodation) :; move.
The four motion detection techniques: blob, idi, edge, and sob are integrated under the fuse operator yeilding their integrated behavior. Our experiments in section 8 verify that the use of the
fuse leads to dramatic improvments in tracking ability over any individual technique. 6 A Platform and Language for Exploiting Redundancy The process composition and integration notation/formalism presented in the previous sections have been implemented in a working system called the Active Vision Shell (AV-shell). The AV-shell can be 6
How a particular integration method is practically implemented in our system is discussed in section 6
viewed as a programming framework for expressing perception and action routines in the context of situated robotics. The AV-shell is a powerful interactive C-shell style interface which provides many capabilities important in an agent architecture such as the ability to combine perceptive capabilities of active vision with active capabilities of other robotic devices, a standardized interface to Active Vision Devices (AVD's), a standard vocabulary of active vision routines and the means for composing and integrating the routines. In this section, we discuss the AV-shell capabilities/interface for specifying composition and redundancy as well as the architecture upon which the AV-shell is built. We begin with an overview of a system called the R-shell [44, 43, 22] which provides the basic structure of the AV-shell and the capabilities of the AV-shell itself. For a more thorough discussion of the AV-shell, please see [21]. 6.1 Robot shell
The R-shell is an interactive program written in C under the UNIX operating system. It interprets and executes robotics related commands called \R-shell commands" in much the same way that a shell such as c-shell interprets and executes system related commands. Robotic data in the R-shell environment are entered, manipulated and displayed through the use of these commands. Included in the set of R-shell commands are an extensive set of algebraic operations such as matrix and vector addition, multiplication, matrix inversion and transposition, computation of determinants, traces and norms as well as various arithmetic operations and standard mathematic functions for scalars all of which are used extensively in robotics work. R-shell commands can be entered interactively or placed in R-shell scripts and executed by simply typing the script le name followed by its actual parameters. Thus scripts provide procedural abstraction in R-shell. Scripts can call other scripts and can call themselves providing nesting and recursion. The R-shell script language provides a full set of ow control statements such as if-then-else,
while, loop and case. Additionally, R-shell supports both call-by-name and call-by-value methods for passing parameters. The R-shell has built in capabilities for handling three main entities: arms, hands and objects. The R-shell maintains an extensive set of data structures used for dynamic creation and manipulation of these entities. Two main modes for running R-shell currently exist: o-line and on-line. Using the o-line mode, users can enter and manipulate R-shell data, create manipulators, hands and objects and write, test and debug scripts without endangering any physical devices (i.e. crashing the robot into the wall because of a programming error). This in eect provides a robotics simulator. After the scripts have been fully debugged, the user can invoke the on-line version of R-shell which permits physical connection to real devices. 6.2 AV-shell
While the R-shell was designed mainly as a robotics environment, the AV-shell extends its capabilities to include active vision by providing a language for cameras and AVD's. To this end, the AV-shell de nes a standardized hierarchical set of active vision routines which support high-level activities such as xation, smooth pursuit and stabilization through a set of medium and low-level routines such as dynamic accommodation, aperture control, saccades, vergence, convolution and histogram. The shell provides a uniform interface to a wide range of AVD's through data structure abstractions. In addition to the text based simulation of the R-shell, the AV-shell provides a 3-D graphics subsystem. With this subsystem, users can control one or more graphically simulated devices simultaneously via the shell. The graphic simulator includes a simulated camera model that lets us \look" on the simulated scene from the viewpoints of multiple cameras. Cameras can be attached to devices in the environment such as heads or manipulators and output from the rendering pipeline can be fed to
algorithms implementing any visual process such as optical ow computations, thus device control loops can be closed through visual input. 6.3 Module composition and integration in the AV-shell
As mentioned earlier the AV-shell provides an extensive set of commands applicable to robotic manipulators, dextrous robotic hands, active vision devices, cameras, and objects. Additionally, the shell provides a full implementation of both the RS composition language and the process integration capabilities discussed in this paper. Using these features, users of the shell have a powerful, easy to use tool for programming perception action tasks. The interface to the composition/integration features of the shell is give through a set of commands which are described below.
avsmode [-nod][-f [ le]][-l [ le]][-s le][-p sjm] Avsmode is used to show or modify the operation modes of the AV-shell. If no parameters are speci ed, this command will display the current status of the AV-shell operation modes. Avsmode option are described as follows: -d - Toggle AV-shell output to screen -f [ le] - Return status le -l [ le] - AV-shell log le -n - Put AV-shell into online mode -o - Put AV-shell into oine mode -s le - read AV-shell script le -p sjm - Interpretation of arc operator def -i name [parameters] = avs-expression This command causes an AV-shell compound parse tree to be constructed. The compound has the name name. The associated parse tree is constructed from the atoms and/or compounds and AV-shell operators given in avs-expression. show atom This command causes a list of the currently de ned atomic processes to be displayed on the screen. show compound This command causes a list of the currently de ned compound processes to be displayed on the screen. show This command causes a graphical representation of the parse tree of the speci ed compound to be displayed on the screen.
The AV-shell provides m-out-of-n voting fusion which is invoked by the fuse operator. The value of
m defaults to n if not speci ed as a parameter to the fuse operator. As an example, we will show the notation for pursuit as speci ed in AV-shell notation:
define pursuit -m VAL = (vergence
. To instantiate a 2-out-of-4
#fuse(blob, idiff, edge, sob) VAL # accommodation) :; move
fusion of the motion detection behavior team:
.
pursuit -m 2
7 Smooth Pursuit and Cue Generation The robustness displayed by biological vision systems is in part a consequence of their ability to update visual system parameters such that the highest quality images of salient objects are continuously delivered. Smooth pursuit is a primary activity of many biological vision systems and hence a topic of much interest in the active vision community. Smooth pursuit is the process by which a moving object is tracked. The goal is to keep the retinal position of the moving object in the fovea. Several bene ts provided by smooth pursuit include object stabilization in the image as tracking emphasizes the signal of the target over the background, and the localization of image processing to the region of the fovea [15]. The formalism for fusing redundant modules of section 4 makes use of the smooth pursuit behavior team in our experiments (section 8) consisting of the motion analysis techniques of: Blob Tracking, Edge Tracking, Image Dierencing and Template Matching. While many other motion analysis techniques exist, we chose these as they are relatively simple to implement and are sucient for the purposes of our experiments. 7.1 Blob tracking
Due to its simplicity and suitability for real-time implementation, blob tracking has been perhaps the single most commonly used technique for motion detection in tracking systems. Numerous works
such as [14] have reported systems that are able to track a black or white blob. The source of many of these systems is a moving light such as a ashlight. However, its is unable to handle multiple objects and is not useful in realistic environments as it assumes an object with high contrast relative to the background. In our implementation of blob tracking, a black blob moving over a white background is used as input. The algorithm thresholds the input image to remove any spurious pixels, then computes the centroid of the blob. Blob tracking is the fastest of our implemented motion detection techniques yielded a sampling rate of approximately 75 Hz. 7.2 Edge tracking
Edge tracking is similar in nature to blob tracking, however, rather than nding the centroid of the entire blob, the centroid of blob edges is found. In order to nd the edges, an edge operator is used on the input image. In our implementation of edge tracking, we rst use the Sobel edge detector [7] to nd image edges, then threshold the results to remove any spurious pixels. The centroid of the edges is then computed. Edge tracking runs at 42 Hz. 7.3 Image dierencing
Image dierencing is performed by dierencing two consecutive frames: d(n; x; y) = f (n; x; y) ? f (n ? 1; x; y), where d(n; x; y) is the dierence image, f (n; x; y) the current frame and f (n ? 1; x; y) the previous frame. Dierencing segments the scene into static and moving regions as only objects that have changed position between two consecutive images will have non zero pixel values. A problem with using image dierencing in smooth pursuit is that retinal motion of the background induced by camera movement can be mistaken as object motion unless this motion is rst subtracted out of the image. In our implementation of image dierencing, the centroid of all pixels falling above a grey-scale threshold was computed over a two times sub-sampled fovea of 100 100 pixels in the dierence image. Our implementation yielded a sampling rate of approximately 20 Hz.
7.4 Template matching
The idea behind template matching is to detect a particular object in an image by searching for instances of a separate sub-image (template) which looks like the desired object. Correlation provides the basis of template matching. For each template location, a similarity measure is computed indicating how well the template matches the image at the current template location. The template location which provides the maximal similarity measure is selected as the location of the object in the image. Dierences in various template matching techniques are usually found in the method used for computing the similarity measure. Several common techniques include the Sum of Squared Dierences (SSD) technique and Normalized Cross Correlation. We use the SSD for computing the similarity measure in our implementation. The SSD similarity between a function f (x) and a template t(x) is given by:
SSD(y) =
XX M
N
=?1 =?1
x
[f (x) ? t(x ? y)]2
(7.1)
y
where M and N are the size of the template. Template matching is sensitive to changes in object shape, size, orientation, and changes in image intensities. Our implementation of template matching with a 15 15 pixel template with the entire 100 100 fovea yields the location with highest similarity. The implementation yielded a sampling rate of approximately 58 Hz.
8 Experiments In this section, we present experiments in which we use the tools and motion analysis techniques presented in previous sections to execute smooth pursuit while exploiting the bene ts of redundancy. We begin with a discussion of what we want to show and how it can be shown. We then brie y discuss
the system on which the experiments were run followed by the actual experiments and discussion. 8.1 Experimental setup
The goal of our experiments is to investigate the hypothesis that fusion of homogeneous behaviors can lead to improved reliability. In particular we want to show that active vision can indeed bene t from module fusion. In order to show this, we have implemented the four cue generation modules discussed in section 7 as well as a module which fuses their results. With this con guration we run a series of tracking experiments on a robotic head. In each experiment, a robotic manipulator eects horizontal translatory motion consisting of two motion segments: from the starting location to a location 200cm to the right and return to the starting location. The speed of the manipulator was set so that the entire 400cm manipulator motion was completed in approximately 15 seconds. In the experiments, we measured the ability of each module to track various combinations of objects and backgrounds. We call each such combination a scenario. The 6 scenarios used in our experiments are shown in gure 4. As can be seen in the gure, objects range from a simple blob to real objects. Similarly, backgrounds range in complexity from a constant beige background to a natural background with randomly placed objects. We tested the performance of each of the ve modules (blob tracking (BLOB), image dierencing (IDIFF), edge tracking (SOB), template matching (TM) and fusion (FUSE)), on each scenario. A total of 30 experiment sets were conducted using the 6 scenarios, where each experiment set consisted of testing each of the four motion tracking modules plus the fusion module on a scenario. To account for variations in lighting and other conditions, each scenario was used for 5 experiment sets (corresponding to 150 single experiments). The scenarios were chosen so as to push the limits of one or more of the modules at a time so that they would fail. The number of failing modules, thus
Figure 4: Scenarios used in experiments. Various combinations of objects and backgrounds provide diering complexities. de nes an index for the complexity of a scenario. E.g., if a setting7, a, leads to the failure of 3 of the modules and another setting, b, leads to the failure of 1 module, then we say that setting a is more complex than setting b. This index allows us to investigate the performance of the fusion module relative to the complexity of the scenario. The histogram in gure 5 (a) shows the distribution of the number of failed modules8 for each of the 30 experiments. E.g., in 11 of the experiments all but one of the modules failed to track the object. Two measures are used to quantify each module's ability to track: absolute error and relative error. Absolute error is a yes/no answer to the question of whether a module successfully tracks during a single experiment. A module is said to track an object successfully if some part of the object is located at the image center during the entire motion sequence, otherwise it has failed to track. The ratio of the number of successful runs to the total number of runs provides a measure of module reliability. We call a scenario and the lighting conditions etc. during the experiments a setting. Scenario complexity is determined based on the performance of the primitive modules, thus the fusion module is not considered in the histogram. 7
8
10
10
No. of occurrences
12
No. of occurrences
12
8
6
4
2
0 −1
8
6
4
2
0
1
2
3
4
No. of failing modules per experiment
5
0 −1
0
1
2
3
4
5
No. of successful modules
(a) (b) Figure 5: (a) Distribution of the number of failing modules at each trial during 30 trials. The fusion module is not taken into consideration here. Setting complexity increases from left to right. (b) Distribution of the number of successful modules at each trial during 30 trials and relation to the number of times fusion succeeds. The lled areas correspond to the portion of times where fusion succeeds when only the given number of modules succeed in an experiment. Setting complexity decreases from left to right. Relative error quanti es the quality of tracking, i.e., a measure of how well tracking is performed. As an expression for relative error we use the distance (in pixels) from the image center (where the tracked object should be in the ideal case), to a xed point on the moving object. We term this expression distance error. We also separate this distance into its X (horizontal) and Y (vertical) components, to investigate the contributions of these components to the error. Here we present the mean and standard deviation of the distance error, along with the mean and standard deviations of the corresponding X and Y components, for the 30 experiments (see table 4). To obtain the distance error, we recorded each motion sequence to video tape. Using the video tape, we manually extracted the pixel distance between image center and the xed point on the moving object over the duration of the tracking9. This determines the \ground truth". To make the results of ground truth extraction as accurate as possible, we overlayed a `+' symbol at the image center and selected the object point located under the `+' at the start of tracking as the xed object point for the rest of the sequence. Further, in order to analyze the behavior of the modules, we recorded their outputs, denoted tracking 9
Every 5th frame of each video sequence was analyzed manually to obtain the distance error.
results, which where used to control the camera head. Since the tracking results themselves did
not contain information about how well the actual tracking was going we had to relate them to the ground truth. Thus in order to correspond the ground truth with the tracking results, we logged, along with the tracking results, the sample/frame number at which the results were generated. We printed the same sample number on (the upper left corner of) each frame of the video sequence. 8.2 Experimental platform
All experiments were conducted in the Intelligent Systems Laboratory at the Technion - Israel Institute of Technology. The system consists of a 90 MHz Pentium PC running DOS which is connected via Ethernet to a Sun workstation running the AV-shell under Unix. Figure 6 illustrates the Technion architecture and the Technion Robot Head upon which the experiments were run. Host Machine
Target Machine 90 MHz. Pentium
Sun Workstation
TRH Technology 80 servo controller Ethernet BarGold IVP-150
Motorola PC-LabCard
UNIX
DOS
(a)
(b)
Figure 6: (a) TRH System Architecture (b) Technion Robot Head 8.3 Cue implementation details
Each module presented in section 7 was implemented in a function in our system. For the purposes of this work, we were not interested in nding implementations which provide the best possible results, we in fact show that it is not critical to have the most accurate implementation as fusion tends to enhance overall performance. Therefore, our implementations were simple and straight forward. Timing analysis is given in table 2.
Module Eective Sample Rate Actual Sample Rate Blob Tracking 75 Hz. 10 Hz. Image Dierencing 20 Hz. 10 Hz. Edge Tracking 42 Hz. 10 Hz. Template Matching 58 Hz. 10 Hz. Fusion 10 Hz. 10 Hz. Table 2: Module timing analysis. Eective sample rate gives the speed at which each module runs independently while the actual sample rate is the rate used in the experiments. The eective sampling rate gives the speed at which each module can run independently. However, in order to obtain comparable results, we eliminate variations in sampling rate which may in uence the performance of the modules (often higher sampling rates lead to improved performance). We normalized the sampling rate by executing all modules sequentially while actuating the motors based on the results of the particular module we were testing only. 8.4 Experimental results
Experimental results are presented in tables 3 and 4. Absolute error results are listed in table 3 along with module reliabilities. Columns 2 and 3 in the table list the number of successful runs and the number of failures respectively. Module reliabilities, listed in the last column, are calculated as the ratio of the number of successful runs to the total number of runs (30). Table 4 summarizes the relative error results, and in particular the mean and standard deviations of the distance error. The table also lists for both X and Y the mean value of distance error along with corresponding standard deviation. The plot in gure 5 (b) illustrates the performance of the fusion module as a function of scenario/setting complexity. The gure consists of two superimposed histograms, one (solid frames) showing the distribution of the per-trial number of successful modules and the other ( lled area) the corresponding portion of the successful fusion trials. In the gure the complexity of the experimental
Module No. of successes No. of Failures Reliability Blob Tracking 11 19 36.7% Image Dierencing 14 16 46.7% Edge Tracking 12 18 40.0% Template Matching 13 17 43.3% Fusion 24 6 80.0% Table 3: Absolute error. Module success and failure rates. Reliability is calculated as the ratio of successful runs to the total number of experiments. The main results are highlighted. Module Mean X Std.Dev X Mean Y Std.Dev Y Mean Std.Dev. Blob Tracking 136.1 188.9 110.0 198.1 185.9 266.4 Image Dierencing 44.0 86.4 12.8 48.8 49.9 97.3 Edge Tracking 72.1 109.0 25.9 92.7 82.6 139.7 Template Matching 76.4 125.1 30.2 111.8 88.6 164.4 Fusion 17.2 24.1 5.2 25.8 19.8 34.3 Table 4: Relative error. Module performance results for 30 runs. Relative error is presented by the mean distance error and standard deviation. The statistics for the corresponding X and Y components are also listed. The main results are highlighted. setting decreases from left to right { the fewer modules that succeed in a setting the more complex it is. As can be seen, in three of the experiments none of the modules succeed, nonetheless the fusion module (surprisingly) succeeds in one of these three cases. Note also that in 11 of the cases only one module succeeds, but in 7 out of the 11 cases the fusion module manages to track successfully. How the fusion can succeed even though all or the majority of the individual modules fail is interesting and will be explained in the remainder of this section. In order to explain this phenomenon and to highlight several interesting aspects of fusion, we present data from a single experiment. The scenario used for this particular experiment is shown in gure 4 in the last row second column. Figure 7 is a plot of the distance errors (the sign of the error is included in the plots) in X and Y respectively. Looking at the gures, we see that in this scenario, the blob detection (BLOB), image dierencing (IDIFF) and edge tracking (SOB) modules are unable to track the object. On the other hand template matching (TM) and fusion (FUSE) succeed in tracking.
300
200
300
BLOB
200
IDIFF
Error in Y [pixel]
Error in X [pixel]
SOB 100
TM FUSE
0
−100
100
0
BLOB −100
IDIFF SOB
−200
−200
−300 0
−300 0
TM FUSE
50
100
150
Frame Nr.
50
100
150
Frame Nr.
Figure 7: Plot of distance error for the modules during experiment. (left) Distance error in the X (horizontal) direction. (right) Distance error in the Y (vertical) direction. Figure 8 contains plots of the actual outputs generated during one experiment. The plots show the outputs generated by the ve modules over 141 frames (with a step of 10 frames, i.e., 1 second). The output of each module is illustrated as a triangular window of width equal to 10 pixels10. Only the plots for the X-axis are shown. The output of each module should be interpreted as the vote for moving the image center to a given pixel. The commanded output is given to the camera head driver, which translates the pixel values to joint angles and drives the motors. The commanded output is chosen as the output with maximum vote. The vertical axis, in each plot, determines the votes which range from 0 to 1, with 1 being the most desirable and 0 the least desirable. The last plot within each subplot is the output of the fusion module, which combines the votes received from the individual modules and chooses the action with the maximum vote, indicated with the dot in the gures. If the maximum is non-unique one is chosen at random. What is interesting to note in the gure is the behavior of the individual modules and their aect on the behavior of one another and the fusion module. In particular frames 51; 61; 71 and 81 are interesting. In frame 51 it is seen that IDIFF and TM lose track of the object while fusion tracks based on information provided by BLOB and SOB. In frame 71, TM resumes tracking, and in frame 10
Changes in window size are due to scaling.
81, IDIFF resumes tracking. Additionally, in subsequent frames it seems that BLOB drifts away and later resumes tracking. As evident from gure 7, if run independently IDIFF fails to track the object through the entire experiment. However, in gure 8 it is seen that IDIFF can resume tracking, when run in conjunction with the other modules. This can be explained as follows. When run independently, if a module loses track of the object, due to some artifact in the image which confuses the algorithm, it may not have the possibility to correct itself, because losing the object eventually causes the object to leave the region of interest (fovea) or even the image. However, with fusion this problem can be corrected do to an eect which we term corrective reinforcement, as other modules may continue to drive the object into the region of interest, where the object/motion is searched for, giving a failing module the opportunity to regain tracking. This will only work if the cause for failure of the modules is dierent or disjunct, so that the modules never (or at least rarely) fail simultaneously. Agreement between modules is usually related to a speci c phenomenon, which they are designed for (e.g., image motion), however, disagreement of modules should not be and usually is not related to a common phenomenon, meaning that modules may fail due to dierent reasons.
9 Concluding Remarks Biological vision systems are remarkably adept at providing useful, high quality visual information in rich dynamic environments. It is becoming increasingly apparent that these capabilities are, in part, a result of the inherent ability of such systems to eectively integrate data from a wide range of visual cues as well as compose modules in a timely, event driven manner. In this paper, we have explored various aspects related to the integration of homogeneous modules and their composition in the context of active vision. In particular, we have presented a uni ed approach for eectively providing both process integration and process composition. The approach
combines a formalism for integrating homogeneous modules and a well known process composition model, Robot Schemas. We have shown how these capabilities are implemented in a working active vision system called the AV-shell. Our experiments in smooth pursuit have con rmed that the performance of several integrated motion analysis modules is dramatically better than the performance of any of the participant modules when run independently. We have shown that corrective reinforcement can lead to higher tracking success rates. Future work includes experimenting with the composition and fusion of modules for performing other active vision activities such as xation and stabilization. Additionally, we would like to extend our work to include non-homogeneous modules.
References
[1] M. Abdulghafour, A. Fellah, and M. A. Abidi. Fuzzy Logic-Based Data Integration: Theory and Applications. Proceedings of the 1994 IEEE Int. Conf. on Multisensor Fusion and Integration for Intelligent Systems, pages 151{160, October 1994. [2] Y. Aloimonos and D. Shulman. Integration of Visual Modules: An Extension of the Marr Paradigm. Academic Press, 1989. [3] Y. Aloimonos, I. Weiss, and A. Bandyopadhyay. Active vision. In International Journal on Computer Vision, pages 333{356, 1987. [4] Tracy L. Anderson and Max Donath. Animal Behavior as a Paradigm for Developing Robot Autonomy. Robotics and Autonomous Systems, 6:145{168, June 1990. [5] R. Bajcsy. Active perception vs passive perception. In Proceedings of the Third IEEE Workshop on Computer Vision, pages 55{59, Bellaire, Michigan, 1985. [6] D.H. Ballard. Animate vision. Elsevier Arti cial Intelligence, 48:57{86, 1991. [7] D.H. Ballard and C.M. Brown. Computer Vision. Prentice-Hall, 1982. [8] I. Bloch. Information Combination Operators for Data Fusion: A Comparative Review with Classi cation. IEEE transactions on Systems, Man, And Cybernetics, Part A: systems and humans, 26(1):42{52, January 1996. [9] K.J. Bradshaw, P.F. McLauchlan, I. D. Reid, and D.W. Murray. Saccade and pursuit on an active head/eye platform. Image and Vision Computing, 12(3):155{163, April 1994. [10] R.A. Brooks. A Robust Layered Control System for a Mobile Robot. Autonomous Mobile Robots, pages 152{161, 1990.
[11] C. Brown. Prediction and cooperation in gaze control. Biological Cybernetics, 63(1):61{70, May 1990. [12] R.H.S. Carpenter. Movements of the Eyes. Pion Limited, London, England, 1988. [13] H.I. Christensen, J. Horstman, and T. Rasmussen. A control theoretical approach to active vision. In 2nd Asian Conference on Computer Vision, pages 364{369, Singapore, December 1995. [14] J.J. Clark and N.J. Ferrier. Modal control of an attentive vision system. In Second International Conference on Computer Vision, pages 514{523, Tampa, Florida, December 1988. [15] D. Coombs and C. Brown. Real-time smooth pursuit tracking for a moving binocular head. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 23{28, Champaign, Illinois, June 1992. [16] J.L. Crowley and H.I. Christensen. Vision as Process. Springer-Verlag, January 1995. [17] B. R. Donald. Error Detection and Recovery in Robotics. Springer-Verlag, 1989. [18] Editors. Information for readers and authors. Any issue of IEEE Transactions on Reliability. [19] Alberto Elfes. Sonar-Based Real-World Mapping and Navigation. Autonomous Robot Vehicles, pages 233{249, 1990. [20] Alberto Elfes. Multi-source Spatial Data Fusion Using Bayesian Reasoning. Data Fusion in Robotics and Machine Intelligence, eds. Abidi and Gonzalez, pages 137{163, 1992. [21] J. Fayman, E. Rivlin, and H. Christensen. The active vision shell. Technical Report CIS 9510, Technion - Israel Institute of Technology, April 1995. [22] J.A. Fayman. Medium-level control of robot hands. Master's thesis, San Diego State University, Department of Mathematical Sciences, San Diego, CA, 1990. [23] R.J. Firby, R.E. Kahn, P.N. Prokopowicz, and M.J. Swain. An architecture for vision and action. In Proceedings of the International Joint Conference on Arti cial Intelligence, pages 72{79, Montreal, Canada, August 1995. [24] E. Gat and G. Dorais. Robot Navigation by Conditional Sequencing. IEEE Int. Conf. on Robotics and Automation, 2:1293{1299, 1994. [25] S. A. Hutchinson and A. C. Kak. Multisensor Strategies Using Dempster-Shafer Belief Accumulation. Data Fusion in Robotics and Machine Intelligence, eds. Abidi and Gonzalez, pages 165{209, 1992. [26] J. Kosecka, R. Bajcsy, and H.I. Christensen. Discrete event modelling of visually guided behaviors. International Journal of Computer Vision. Special Issue on Qualitative Vision, 12(3):295{ 316, 1995. [27] S. Kristensen. Sensor planning with Bayesian decision theory. In Reasoning with Uncertainty in Robotics. University of Amsterdam, The Netherlands, December 1995. [28] Zhi Luo and Dehua Li. Multi-source Information Integration in Intelligent Systems Using the Plausibility Measure. Proceedings of the 1994 IEEE Int. Conf. on Multisensor Fusion and Integration for Intelligent Systems, 0(0), October 1994.
[29] D.M. Lyons. Representing and analyzing action plans as networks of concurrent processes. IEEE Transactions on Robotics and Automation, 9(7):241{256, June 1993. [30] D.M. Lyons and M.A. Arbib. A task-level model of distributed computation for sensory-based control of complex robot systems. In IFAC Symposium, Robotic Control, Barcelona, Spain, November 1985. [31] D.M. Lyons and M.A. Arbib. A formal model of computation for sensory-based robotics. In IEEE Transactions on Robotics and Automation, volume 5, pages 280{293, June 1989. [32] James Manyikia and Hugh Durrant=Whyte. Data Fusion and Sensor Management { a decentralized informationtheoretic approach. Ellis Horwood, 1994. [33] K. Pahlavan, T. Uhlin, and J-O. Eklundh. Dynamic xation. In Fourth International Conference on Computer Vision, pages 412{419, Berlin, Germany, May 1993. [34] S. Pankanti, A.K. Jain, and M. Tuceryan. On integration of vision modules. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 316{322, Seattle, Washington, June 1994. [35] Behrooz Parhami. Voting Algorithms. IEEE Transactions on Reliability, 43(3):617{629, December 1994. [36] Judea Pearl. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. The Morgan Kaufmann Series in Representation and Reasoning. Morgan Kaufmann, San Mateo, California, 1988. [37] A. Saotti, K. Konolige, and E.H. Ruspini. A multivalued logic approach to integrating planning and control. Arti cial Intelligence, 76:481{526, March 1995. [38] Glenn Shafer. A mathematical theory of evidence. Princeton University Press, 1976. [39] R.G. Simmons. Structured Control for Autonomous Robots. IEEE Trans. on Robotics and Automation, 10(1):34{43, February 1994. [40] M.J. Swain and M.A. Stricker. Promising directions in active vision. International Journal of Computer Vision, 11(2):109{126, 1993. [41] John K. Tsotsos. Behaviorism, Intelligence and the Scaling Problem. AAAI 1994, 1993. [42] T. Uhlin and J-O. Eklundh. Animate vision in a rich environment. In Proceedings of the International Joint Conference on Arti cial Intelligence, pages 27{33, Montreal, Canada, August 1995. [43] M.I. Vuskovic. R-shell: A unix-based development environment for robotics. In Proceedings of the IEEE International Conference on Robotics and Automation, Philadelphia, Pennsylvania, April 1988. [44] M.I. Vuskovic, A.L. Riedel, and C.Q. Do. The robot shell. International Journal of Robotics and Automation, 3(3):165{175, 1988. [45] W.Y.K. Wai and J.K. Tsotsos. Directing attention to onset and oset of image events for eye-head movement control. In Proceedings of the IEEE Workshop on Visual Behaviors, pages 79{84, Seattle, Washington, June 1994. [46] Lot A. Zadeh. Fuzzy Sets. In Information and Control, volume 8, pages 338{353, 1965.
Frame nr. 1
Frame nr. 11
1 BLOB
0.5
IDIFF
0.5
SOB
0.5
TM
0.5
0 1 −20 0 1 −20 0 1 −20 0 1 −20 FUSE
−15
−10
−15
−5
−10
−15
−5
−10
−15
0
0
−5
−10
5
−5
0.5
SOB
0.5
TM
0.5
0 −20 1 0 −20 1
10
5
0.5 −20 −20
1
0 −20 1
10
5
0
IDIFF
10
5
0
0.5
0 −20 1
10 FUSE
0.5 1 0 0
−15
−10
−5
0
5
10
0.5 1 0 −20 0.5
−15
−10
−5 X
0
5
10
0 −20
−15
−10
−15
0.5
IDIFF
0.5
SOB
0.5
TM
0.5
0 1 0 1 0 1 0 1 FUSE
0.5 1 0 0.5 0
−15
IDIFF
0.5
SOB
0.5
−20
−15
−20
−15
−20
−15
−20
−15
−10
−5
−10
0
−5
−10
0
−5
−10
0
−5
0
5
IDIFF
0.5
SOB
0.5
TM
0.5
0 −20 1
10
5
0 −20 1
10
5
0 −20 1
10
5
0 −20 1
10
−20
−15
−10
−5
0
5
10
−20
−15
−10
−5 X
0
5
10
0 −20
−80
−80
−80
−60
−60
−60
−40
−40
−40
−20
−20
−20
0
0
0
−80
−60
−40
−20
0
20
−80
−60
−40 X
−20
0
20
−60
−40
−20
IDIFF
0.5
SOB
0.5
0 1 TM
0.5
FUSE
0.5 1 0
0 1
0.5 0
−20
−25
−15
−20
−25
−20
−25
−10
−15
−20
−10
−15
0.5
SOB
0.5
0 −15 1
0
TM
0.5
FUSE
0.5 1 0
−10
0 −15 1
0
−5
−5
−5
−5
BLOB
0.5
IDIFF
0.5
0 −15 1
SOB
0.5
TM
0.5
FUSE
0.5 1 0
−10
−5
0
−10
−5
0
0
0
0
5
−15
−10
−5
0
5
10
−15
−10
−5
0
5
10
10 10
0 −20
−15
−15
−15
−15
−10
−5
−10
X
0
−5
−10
0
−5
−10
Frame nr. 51
0
−5
0
5
10
5
IDIFF
0.5
SOB
0.5
TM
0.5
0 1 −100 0 1 −100
15
10
5
0.5
15
10
5
BLOB
0 1 −100
15
10
0 1 −100
15 FUSE
−15
−10
−5
−15
−10
−5
−80
−80
−80
−80
0
5
10
15
0
5
10
15
0.5 1 0
0.5 −100 0
−100
−80
−60
−40
−20
0
20
−80
−60
−40
−20
0
20
−80
−60
−40
−20
0
20
−80
−60
−40
−20
0
20
−80
−60
−40
−20
0
20
−80
−60
−40 X
−20
0
20
Frame nr. 81
−60
−40
−60
−40
−60
−40
−60
−40
−20
−20
−20
−20
BLOB
0.5
IDIFF
0.5
SOB
0.5
0 1
0
0 1
0
0 1
0 TM
0.5
FUSE
0.5 1 0
0 1
0
−100
−80
−60
−40
−20
0
−100
−80
−60
−40
−20
0
−10
0.5 0
−20
−15
−10
−5
−20
−15
−10
−5
−20
−15
−10
−5
−20
−15
−10
−5
−20
−15
−10
−5
−20
−15
−10
−5
X Frame nr. 111
0
−5
−10
0
−5
−10
−15
0.5
5
5
IDIFF
0.5
−10
IDIFF
0.5
SOB
0.5
0 1 0 1
5
0
−5
0.5
5
0
−5
BLOB
0 1
5
0
TM
0.5
FUSE
0.5 1 0
0 1
5
−15
−10
−5
0
5
−15
−10
−5 X
0
5
0.5 0
−20
−15
−10
−5
0
5
−20
−15
−10
−5
0
5
−20
−15
−10
−5
0
5
−20
−15
−10
−5
0
5
−20
−15
−10
−5
0
5
−20
−15
−10 X
−5
0
5
Frame nr. 141
SOB
0.5
−10
−5
0
5
10
−10
−5
0
5
10
1
0 1 −15 0 1 −15 0 1 −15
10
0 −15
5
0.5
10
0.5 1 0 −15 0.5
0
BLOB 10
FUSE
X
10
Frame nr. 131
0.5 −5
5
5
1
−10
0
5
−15
0 1
0
TM
0 −15 1
−5
0
−15
0 1
0
−15
−5
−10
0
−15
0 1
0
−15
−10
−15
1
0 1
0
−20
−5
10
Frame nr. 101
−20
−10
5
10
−5 X
−100
0.5
−25
−5
0
5
−5
−100
0 1
20
−25
−10
−5
0
−10
−100
0 1
1
IDIFF
−10
−5
−10
X
0.5
−15
X
−10
−15
10
−10
−15
1
−25
5
−15
−15
Frame nr. 121 BLOB
10
0
0 −20 1 0.5 1 0 −20 0.5
Frame nr. 91
0 1
5
FUSE
−100
0 1
1
0.5
0 −20 1
−5
1
0 1
20
0 −120 −100
−80
0.5
20
0.5 1 0 −120 −100 0.5
0 −120 −100 1
BLOB 20
FUSE
SOB
0.5
10
−10
Frame nr. 71
0.5
0 1
TM
0 −20 1
−15
X
TM
0.5
0.5
1
0 −120 −100 1
IDIFF
SOB
0 −20 1
10
5
0
0.5
1
0.5 1 0 −20 0.5
0 −120 −100 1
0.5
−5
IDIFF
10
5
0
0.5
Frame nr. 41 0.5
FUSE
0 −120 −100 1
BLOB
5
0
−5
−10
Frame nr. 61 0.5
−5
−10
−15
0
BLOB
1 BLOB
1 BLOB
−5
−10
Frame nr. 31 1 BLOB
Frame nr. 21
1 BLOB
TM
0.5
FUSE
0.5 1 0
0 1 −15
10
0.5 −15 0
−15
−10
−10
−10
−10
−5
0
−5
0
−5
0
−5
0
0.5
IDIFF
0.5
SOB
0.5
0 1
5
0 1
5
0 1
5 TM
0.5
FUSE
0.5 1 0
0 1
5
−10
−5
0
5
−10
−5
0
5
X
BLOB
0.5 0
−100
−80
−60
−40
−20
0
−100
−80
−60
−40
−20
0
−100
−80
−60
−40
−20
0
−100
−80
−60
−40
−20
0
−100
−80
−60
−40
−20
0
−100
−80
−60
−40
−20
0
X
Figure 8: Sequence showing the tracking results and the fusion process. During the sequence the camera is driven by the fusion module. The plots present each modules votes for each action (here motion in the horizontal direction). These votes are combined by the fusion module (FUSE) and the best action indicated by the dot in the plots is selected.