A General Framework for the Selection of World Coordinate ... - Core

3 downloads 0 Views 480KB Size Report
from the problem that the direction of translation may lie outside the field of view. Panoramic imaging over- comes this problem making the uncertainty of camera.
International Journal of Computer Vision 57(1), 23–47, 2004 c 2004 Kluwer Academic Publishers. Manufactured in The Netherlands. 

A General Framework for the Selection of World Coordinate Systems in Perspective and Catadioptric Imaging Applications ˜ P. BARRETO AND HELDER ARAUJO JOAO Institute of Systems and Robotics, Department of Electrical and Computer Engineering, University of Coimbra, Coimbra, Portugal Received January 15, 2002; Revised November 19, 2002; Accepted April 17, 2003

Abstract. An imaging system with a single effective viewpoint is called a central projection system. The conventional perspective camera is an example of central projection system. A catadioptric realization of omnidirectional vision combines reflective surfaces with lenses. Catadioptric systems with an unique projection center are also examples of central projection systems. Whenever an image is acquired, points in 3D space are mapped into points in the 2D image plane. The image formation process represents a transformation from 3 to 2 , and mathematical models can be used to describe it. This paper discusses the definition of world coordinate systems that simplify the modeling of general central projection imaging. We show that an adequate choice of the world coordinate reference system can be highly advantageous. Such a choice does not imply that new information will be available in the images. Instead the geometric transformations will be represented in a common and more compact framework, while simultaneously enabling newer insights. The first part of the paper focuses on static imaging systems that include both perspective cameras and catadioptric systems. A systematic approach to select the world reference frame is presented. In particular we derive coordinate systems that satisfy two differential constraints (the “compactness” and the “decoupling” constraints). These coordinate systems have several advantages for the representation of the transformations between the 3D world and the image plane. The second part of the paper applies the derived mathematical framework to active tracking of moving targets. In applications of visual control of motion the relationship between motion in the scene and image motion must be established. In the case of active tracking of moving targets these relationships become more complex due to camera motion. Suitable world coordinate reference systems are defined for three distinct situations: perspective camera with planar translation motion, perspective camera with pan and tilt rotation motion, and catadioptric imaging system rotating around an axis going through the effective viewpoint and the camera center. Position and velocity equations relating image motion, camera motion and target 3D motion are derived and discussed. Control laws to perform active tracking of moving targets using visual information are established. Keywords:

1.

sensor modeling, catadioptric, omnidirectional vision, visual servoing, active tracking

Introduction

Many applications in computer vision, such as surveillance and model acquisition for virtual reality, require that a large field of view is imaged. Visual control of motion can also benefit from enhanced fields of view. The computation of camera motion from a sequence of images obtained with a traditional camera suffers

from the problem that the direction of translation may lie outside the field of view. Panoramic imaging overcomes this problem making the uncertainty of camera motion estimation independent of the motion direction (Gluckman and Nayar, 1998). In position based visual servoing keeping the target in the field of view during motion raises severe difficulties (Malis et al., 1999). With a large field of view this problem no longer exists.

24

Barreto and Araujo

One effective way to enhance the field of view of a camera is to use mirrors (Bogner, 1995; Nalwa, 1996; Yagi and Kawato, 1990; Yamazawa et al., 1993, 1995). The general approach of combining mirrors with conventional imaging systems is referred to as catadioptric image formation (Hecht and Zajac, 1974). The fixed viewpoint constraint is a requirement ensuring that the visual sensor only measures the intensity of light passing through a single point in 3D space (the projection center). Vision systems verifying the fixed viewpoint constraint are called central projection systems. Central projection systems present interesting geometric properties. A single effective viewpoint is a necessary condition for the generation of geometrically correct perspective images (Baker and Nayar, 1998), and for the existence of epipolar geometry inherent to the moving sensor and independent of the scene structure (Svoboda et al., 1998). It is highly desirable for any vision system to have a single viewpoint. The conventional perspective CCD camera is widely used in computer vision applications. In general it is described by a central projection model with a single effective viewpoint. Central projection cameras are specializations of the general projective camera that can be modeled by a 3×4 matrix with rank 3 (Hartley and Zisserman, 2000). In Baker and Nayar (1998), Baker and Nayar derive the entire class of catadioptric systems with a single effective viewpoint. Systems built using a parabolic mirror with an orthographic camera, or an hyperbolic, elliptical or planar mirror with a perspective camera verify the fixed viewpoint constraint. In Geyer and Daniilidis (2000), introduce an unifying theory for all central catadioptric systems where conventional perspective imaging appears as a particular case. They show that central panoramic projection is isomorphic to a projective mapping from the sphere to a plane with a projection center on the perpendicular to the plane. A modified version of this unifying model is introduced in the paper (Barreto and Araujo, 2001). General central projection image formation can be represented by a transformation from 3 to 2 . Whenever an image is acquired, points in 3D space are mapped into points in the 2D image plane. Cartesian coordinate systems are typically used to reference points both in space and in the image plane. The mapping is non-injective and implies loss of information. The relationship between position and velocity in the 3D space and position and velocity in the image are in general complex, difficult and non-linear. This paper shows that

the choice of the coordinate system to reference points in the 3D space is important. The intrinsic nature of image formation process is kept unchanged but the mathematical relationship between the world and the image becomes simpler and more intuitive. This can help not only the understanding of the imaging process but also the development of new algorithms and applications. The first part of the paper focuses on static imaging systems that include both perspective cameras and catadioptric systems. A general framework to describe the mapping from 3D points to 2D points in the image plane is presented. The mathematical expression of this global mapping depends on the coordinate system used to reference points in the scene. A systematic approach to select the world coordinate system is presented and discussed. Differential constraints are defined to enable the choice of a 3D reference frame. Coordinate transformations satisfying these differential constraints bring advantageous properties when mapping 3D space velocities into 2D image velocities. One such coordinate transformation is described for the case of the perspective camera and then generalized for central catadioptric image formation. Using these coordinate transformations does not imply that new information is available in the images. Instead the geometric transformations are represented in a common and more compact framework, while simultaneously enabling newer insights into the image formation process. Examples and applications that benefit from an adequate choice of the world coordinate system are presented and discussed. The second part of our article applies the derived mathematical framework to active tracking of moving targets. For this purpose it is assumed that the imaging sensor is mounted on a moving platform. Three different cases are considered: a perspective camera with translational motion in the XY plane, a perspective camera with rotational pan and tilt motion and a parabolic omnidirectional camera with a rotational degree of freedom around the Z axis. The goal of the tracking application is to control the motion of the platform in such a way that the position of the target in the image plane is kept constant. In the classical eye-in-hand positioning problem the camera is typically attached to the end effector of a 6 d.o.f. manipulator. The platforms considered in this work have less than 3 d.o.f. For the purpose of controlling the constrained 3D motion of these robots it is not necessary to determine the full pose of the target. It is assumed that target motion is characterized by the 3D

Selection of World Coordinate Systems in Perspective and Catadioptric Imaging Applications

position and velocity of the corresponding mass center in an inertial reference frame. It is also assumed that the position of each degree of freedom is known (possibly via an encoder). In active tracking applications the image motion depends both on target and camera 3D motion. The derived general framework to describe the mapping between 3D points and points in the 2D image plane is extended to central catadioptric imaging systems with rigid motion. The mathematical expression of the global mapping depends on the world coordinates used to reference points in the scene. General criteria to select suitable coordinate systems are discussed. Adequate choices are presented for each type of platform. The derived mathematical framework is used to establish the position and velocity relationships between target 3D motion, camera motion and image motion. The expressions obtained are used to implement image based active visual tracking. Simplifications of the equations obtained (to decouple the degrees of freedom of the pan and tilt vision system) are discussed.

2.

Static Imaging Systems

This section focuses on the static central projection vision systems. Examples of such systems are the perspective camera and catadioptric systems that verify the fixed viewpoint constraint (Baker and Nayar, 1998). The image acquisition process maps points from the 3D space into the 2D image plane. Image formation performs a transformation from 3 to 2 that can be denoted by F. A generic framework to illustrate the transformation F is proposed. This framework is general to both conventional perspective cameras and central projection catadioptric systems. It is desirable that F be as simple as possible and as compact as possible. This can be achieved by selecting a specific coordinate systems to reference the world points. General criteria to select the world coordinate system are presented and discussed. Advantages of using different world coordinate systems to change the format of the F mapping are presented. 2.1.

Mapping Points from the 3D Space in the 2D Image Plane

Figure 1 depicts a generic framework to illustrate the transformation F from 3 in 2 performed by a central projection vision system. If the vision system has a

25

Ω = ( φ, ψ, ρ ) ε R 3

Γ()

T()

X w = ( X, Y, Z )

ε R3

function f ()

h

X h= ( X, Y, Z, 1) ε P 3 P=R[I|−C] x= (x ,y ,z ) ε P 2 function fi ()

x i = (x i , y i ) ε R 2 Figure 1. Schematic of the mapping performed by general central projection imaging systems.

unique viewpoint, it preserves central projection and geometrically correct perspective images can always be generated (Gluckman and Nayar, 1998). Xw = (X, Y, Z )t is a vector with the Cartesian 3D coordinates of a point in space. The domain of transformation is the set D of visible points in the world with D ⊂ 3 . Function fh maps 3 into the projective space ℘ 3 . It is a non-injective and surjective function transforming Xw = (X, Y, Z )t in Xh = (X, Y, Z , 1)t that are the homogeneous world point coordinates. P is an arbitrary 3 × 4 homogeneous matrix with rank 3. It represents a general projective transformation performing a linear mapping of ℘ 3 into the projective plane ℘ 2 (x = PXh ). The rank 3 requirement is due to the fact that if the rank is less than 3 then the range of the matrix will be a line or a point and not the whole plane. The rank 3 requirement guarantees that the transformation is surjective. In the case of P being a camera ˜ where I model it can be written as P = KR[I| − C] is a 3 × 3 identity matrix, K is the intrinsic parameters matrix, R the rotation matrix between camera ˜ the projection and world coordinate systems and C center in world coordinates (Hartley and Zisserman, 2000). If nothing is stated we will assume K = I and standard central projection with P = [I | 0]. Function fi transforms coordinates in the projective plane x = (x, y, z)t into Cartesian coordinates in the image plane xi = (xi , yi )t . It is a non-injective, surjective

26

Barreto and Araujo

function of ℘ 2 in 2 that maps projective rays in the world into points in the image. For conventional perspective cameras xi = fi (x) ⇔ (xi , yi ) = ( xz , yz ). However, as it will be shown later, these relations are more complex for generic catadioptric systems. The transformation F maps 3D world points into 2D points in the image. Points in the scene were represented using standard cartesian coordinates. However a different coordinate system can be used to reference points in the 3D world space. Assume that Ω = (φ, ψ, ρ)t are point coordinates in the new reference frame and that Xw = T(Ω) where T is a bijective function from 3 in 3 . The transformation F, mapping 3D world points Ω in image points xi (see Eq. (1)), can be written as the composition of Eq. (2). xi = F(Ω) F(Ω) = fi (Pfh (T(Ω)))

(1) (2)

Equation (3), obtained by differentiating Eq. (1) with respect to time, establishes the relationship between ˙ = (φ, ˙ ψ, ˙ ρ) velocity in 3D space Ω ˙ t and velocity in t ˙ are related by the jacoimage x˙ i = (x˙ i , y˙ i ) . x˙ i and Ω bian matrix JF of transformation F. Equation (4) shows JF as the product of the Jacobians of the transformations that make up F. ˙ x˙ i = JF Ω

(3)

JF = Jfi · JP · Jfh · JT

(4)

Function T represents a change of coordinates. It must be bijective which guarantees that it admits an inverse. Assume that Γ is the inverse function of T (Γ = T−1 ). Function Γ, from 3 into 3 , transforms cartesian coordinates Xw in new coordinates Ω (Eq. (5)). JΓ is the jacobian matrix of Γ (Eq. (6)). If T is injective then the jacobian matrix JT is non-singular with inverse JΓ . Replacing JT by J−1 Γ in Eq. (4) yields Eq. (7) showing the jacobian matrix of F expressed in terms of the scalar function of Γ and its partial derivatives. Γ(Xw ) = (φ(X, Y, Z ), ψ(X, Y, Z ), ρ(X, Y, Z ))t   φ X φY φ Z   JΓ = ψ X ψY ψ Z  ρ X ρY ρ Z JF = Jfi · JP · Jfh · J−1 Γ

(5) (6) (7)

2.2.

Criteria to Select the World Coordinate System

Function F is a transformation from 3 (3D world space) into 2 (image plane). In Eqs. (8) and (9) F and JF are written in terms of scalar functions and their partial derivatives. The relationship between world and image points can be complex and counter intuitive. The mathematical expression of the mapping function F depends on the transformation T (see Eqs. (2), (4) and (7)). The selection of a certain coordinate system to reference points in the scene changes the way F is written but does not change the intrinsic nature of the mapping. However, with an adequate choice of the world coordinate system, the mathematical relationship between position and velocity in space and position and velocity in the image plane can become simpler, more intuitive or simply more suitable for a specific application. In this section we discuss criteria for the selection of the world coordinate system. F(Ω) = (h(φ, ψ, ρ), g(φ, ψ, ρ))t   hφ hψ hρ JF = gφ gψ gρ

(8) (9)

2.2.1. The Compactness Constraint. Consider central projection vision systems as mappings of 3D points, expressed in Cartesian coordinates Xw = (X, Y, Z )t , into the 2D image coordinates xi = (xi , yi ). The transformation is a function from 3 into 2 with loss of information (depth). In general the two coordinates in the image plane depend on the three coordinates in space. The image gives partial information about each one of the three world coordinates but we are not able to recover any of those parameters without further constraints. The imaging process implies loss of information and there is no additional transformation T that can change that. However it would be advantageous that image coordinates depend only on two of the 3D parameters. In many situations that can be achieved by means of a change of coordinates T. The coordinate change must be performed in such a way that F only depends on two of those coordinates. Assuming that Ω = (φ, ψ, ρ) are the new 3D coordinates, F becomes a function of only φ and ψ which means that partial derivatives h ρ and gρ are both equal to zero. Whenever a certain change of coordinates T leads to a jacobian matrix JF with a zero column, it is said that mapping F is in a compact form and coordinate transformation T verifies the “compactness constraint”.

Selection of World Coordinate Systems in Perspective and Catadioptric Imaging Applications

Assume that a world coordinate system satisfying the “compactness constraint” is selected. If Eq. (10) is verified then the image coordinates (xi , yi ) depend only on (φ, ψ) and F becomes a function from 2 in 2 (xi = F(Ωc ) with Ωc = (φ, ψ)t ). A function from 3 into 2 is never invertible, thus putting F in a compact form is a necessary condition to find out an inverse mapping F−1 . If F−1 exists then two of the three 3D parameters of motion can be recovered from image (Ωc = F−1 (xi )) and the jacobian matrix JF can be written in term of image coordinates xi and yi . By verifying the “compactness constraint” the relationships in position and velocity between the 3D world and the image plane tend to be more compact and intuitive and vision yields all the information about two of the 3D world coordinates and none about the third one. h ρ = 0 ∧ gρ = 0

(10)

2.2.2. The Decoupling Constraint. Assume that the “compactness constraint” is verified. This means that a coordinate transformation T is used such that image coordinates (xi , yi ) depend only on (φ, ψ). It would be also advantageous to define a world coordinate system such that xi depends only of φ and yi depends only of ψ. This is equivalent to say that h ψ and gφ are both zero. The one to one correspondence is an advantageous feature allowing a better understanding of the imaging process and simplifying subsequent calculations. If a coordinate transformation T is used such that both Eqs. (10) and (11) are verified then it is said that F is in a compact and decoupled form and that T verifies both the “compactness constraint” and the “decoupling constraint”. h ψ = 0 ∧ gφ = 0

(11)

In short, given a general central projection mapping, the goal is to select a coordinate transformation T verifying both: • the “compactness constraint” (Eq. (10)) • the “decoupling constraint” (Eq. (11)) The coordinate system used to reference points in the scene does not change the intrinsic nature of the mapping nor introduces any additional information. There are situations where it is impossible to find a world coordinates transformation that verifies the “compactness constraint” and/or the “decoupling constraint.” Methodologies to find out if it exists such a transformation will be introduced latter.

2.3.

27

Conventional Perspective Camera

Consider image acquisition performed by a static conventional perspective camera. The image formation process follows the scheme depicted in Fig. 1 where function fi is given by Eq. (12). Assume that the matrix of intrinsic parameters is K = I and P = [I | 0] (the origin of the cartesian reference frame is coincident with the camera center and the image plane is perpendicular to the Z axis). This section derives a world coordinate system that verifies both the compactness and decoupling constraint. If nothing is stated we will work with the inverse transformation Γ instead of the direct transformation T.

x y fi () : (x, y, z) −→ , (12) z z 2.3.1. Constraining Γ to Obtain a New World Coordinate System. Functions fi , P and fh , as well as their jacobian matrices, are defined for the perspective camera case. Replacing JΓ (Eq. (6)) in Eq. (7) yields JF in terms of the partial derivatives of the scalar functions of Γ (the computation is omitted). If F is in a compact form then the third column of JF must be zero (Eq. (10)) which leads to Eqs. (13). A transformation of coordinates Γ that verifies the compactness constraint can be computed by solving the partial differential Eqs. (13) with respect to the scalar functions φ, ψ and ρ (Eq. (5)).

Z (φY ψ Z − φ Z ψY ) + X (φY ψ X − φ X ψY ) = 0 (13) Z (φ Z ψ X − φ X ψ Z ) + Y (φY ψ X − φ X ψY ) = 0

The partial differential equations corresponding to the “decoupling constraint” can be derived in a similar way. If the mapping F is decoupled then both h ψ and gφ must be zero, which leads to Eq. (14). A world coordinate transformation Γ verifying both the compactness and the decoupling constraint can be computed by solving simultaneously Eqs. (13) and (14). Nevertheless the integration of systems of partial differential equations can be difficult and in general it generates many solutions. Adequate coordinate systems will be derived by geometrical means. Equations (13) and (14) will be used to prove that the selected coordinate transformation verifies the compactness and/or decoupling constraints.

Z (φ Z ρY − φY ρ Z ) + X (φY ρ X − φ X ρY ) = 0 Z (ψ Z ρ X − ψ X ρ Z ) + Y (ψY ρ X − ψ X ρY ) = 0

(14)

28

Barreto and Araujo

Table 1. Using a new coordinate system for perspective camera.   √   Γ(Xw ) = (arctan XZ , arctan − YZ , X 2 + Y 2 + Z 2 )t ρ (tan(φ), − tan(ψ), 1)t 1+tan(φ)2 +tan(ψ)2 F(Ω) = (tan(φ), − tan(ψ))t

T(Ω) = √ 

1

0

2

cos(φ) JF (Ω) =  0

Figure 2.

Conventional perspective camera case.

Figure 2 is a representation of the image formation process for the specified perspective camera situation. A point in the scene with cartesian coordinates (X, Y, Z ) is projected in the image plane at position (xi , yi ). It is assumed that only the points in front of the camera are visible (Z > 0). The world points that can be projected in a certain image vertical line, lie in a vertical plane rotated around the camera referential Y axis, containing the center of projection. In a similar manner, the world points projected in an horizontal line of the image belong to an horizontal plane rotated around the X axis. Consider the pencil of the vertical planes where each one is indexed by the corresponding rotation angle φ. There is a one to one correspondence between the vertical lines in the image and the φ angles. In a similar manner, a pencil of horizontal planes, indexed by the corresponding rotation angle ψ, is defined. Each horizontal line in the image is associated to one ψ angle. Each pair (φ, ψ) defines a pair of planes that intersect in a projective ray. If the depth of the world point along the ray is ρ (always positive), the set of values (φ, ψ, ρ) gives the position of the world point in an unique way. We have derived a new system of coordinates to represent points in the 3D space. Equation (15) establishes the relationship between (φ, ψ, ρ) and the conventional Cartesian coordinates (X, Y, Z ).

 X   φ = arctan    Z 

Y (15) ψ = arctan −    Z   √  ρ = X2 + Y 2 + Z2 Equation (16) gives the jacobian matrix of the derived coordinate transformation Γ. The proposed change of coordinates is a solution of the set of differ-

1 − cos(ψ) 2

 0  0

ential Eqs. (13) and (14). Γ satisfies both the compactness and decoupling constraint for the static perspective camera case.   JΓ =   √

Z X 2 +Z 2

0

X − X 2 +Z 2

0

Z − Y 2 +Z 2

Y Y 2 +Z 2

X X 2 +Y 2 +Z 2



Y X 2 +Y 2 +Z 2



   (16) 

Z X 2 +Y 2 +Z 2

Table 1 summarizes the results obtained in this section. Γ is a world coordinate transformation verifying both the compactness and decoupling constraint. Notice that the new coordinate system is different from the well known spherical coordinates. T is the inverse function of Γ. Replacing T in Eq. (2) (fi , P and fh were already defined) leads to the mathematical expression of global mapping F using the new coordinates. The jacobian matrix JF is obtained replacing JΓ in Eq. (7) by the result of Eq. (16). 2.3.2. Applications. The global mapping F and its jacobian matrix establish the relationship between position/velocity in space and position/velocity in image for the conventional camera situation. Assume that both image position xi and velocity x˙ i are known. It was stated that there is a loss of information in the image formation process. Therefore it is not possible to fully recover 3D target motion from images without further information. Nevertheless, using a world coordinate system verifying the compactness and decoupling constraint, it is possible to partially recover the 3D parameters of motion in a straightforward manner. Table 1 shows the mapping F and the corresponding jacobian matrix JF written in terms of derived system of coordinates. Table 2 gives the mathematical expressions of F and JF using world cartesian coordinates. In the former situation F is in a compact form and it is possible to invert the mapping to recover position information. One obtains that φ = arctan(xi ) and ψ = − arctan(yi ). When using cartesian coordinates,

Selection of World Coordinate Systems in Perspective and Catadioptric Imaging Applications

Table 2. Using a cartesian coordinate system for perspective camera. t  F(Xw ) = XZ , YZ  1 0 − ZX2 Z JF (Xw ) = 0 Z1 − ZY2

Xcam

JF (xi ) =

2.4.

0



0

− 1+

yi2

Ocam image coordinatesxi

Xw= T( Ω )

Zcam

proj. ray xc

proj. ray x

d

F appears as a function of 3 in 2 , and the inversion is not possible. Equation (17) shows JF written in terms of image coordinates. It is derived by replacing in the jacobian matrix of Table 1 (φ, ψ) by (arctan(xi ), − arctan(yi )). Knowing both position and velocity in the image one obtains φ˙ = (1 + xi2 )−1 x˙ i and ψ˙ = −(1 + yi2 )−1 y˙ i . Using the derived world coordinate system it is possible to partially recover the position and velocity of the target in the scene.  1 + xi2

29



0



0

(17)

Central Catadioptric Imaging System

A catadioptric realization of omnidirectional vision combines reflective surfaces and lenses. Central catadioptric imaging can be highly advantageous for many applications because it combines two important features: a single projection center and a wide field of view. The drawback of this type of sensors is that in general the mapping between points in the 3D world and in the image is highly non-linear. Many times correct perspective images are generated from frames captured by catadioptric sensors, and subsequently processed. We wish to study the advantages of working directly with the catadioptric images without warping them. Geyer and Daniilidis (2000) introduce an unifying theory for central catadioptric systems. A modified version of this unifying model was presented in Barreto and Araujo (2001). The model is used to derive a general transformation of coordinates that leads to a mapping between points in the world and the in the image that verifies both the compactness and the decoupling constraints. Some applications are presented and discussed. 2.4.1. General Model for Central Projection Systems. In Baker and Nayar (1998) derive the entire class of catadioptric systems with a single effective viewpoint.

Xm

Z

O

X 4p

Figure 3.

Central catadioptric image formation.

Figure 3 is a scheme of the catadioptric system combining an hyperbolic reflective surface with a perspective camera. The hyperbola is placed such that its axis is the Z -axis, its foci are coincident with O and Ocam (the origin of coordinate systems  and cam ), its latus rectum is 4 p and the distance between the foci is d. Light rays incident with O (the inner focal point) are reflected into rays incident with Ocam (the outer focal point). Assume a perspective camera with projection center in Ocam pointed to the mirror surface. All the captured light rays go originally through the inner focus of the hyperbolic surface. The effective viewpoint of the grabbed image is O and is unique. Elliptical catadioptric images are obtained combining an elliptical mirror with a perspective camera in a similar way. In the parabolic situation a parabolic mirror is placed such that its axis is the Z -axis, and its unique finite real focus is coincident with O. Light rays incident with O are reflected into rays parallel with the Z -axis which are captured by an orthographic camera with image plane perpendicular to the Z -axis. The effective viewpoint is in O and is unique. A catadioptric system made up of a perspective camera steering a planar mirror also verifies the fixed viewpoint constraint. The effective projection center is behind the mirror in the perpendicular line passing through camera center. Its distance to the camera center is twice the distance between the planar mirror and the camera. Consider a generic scene point, visible by the catadioptric system, with cartesian coordinates Xw in the world reference frame. The corresponding homogeneous representation is Xh . Visible points in the scene Xh are mapped into projective rays/points

30

Barreto and Araujo

Table 3.

Mapping parameters l and m.

Parabolic

l

m

1

1 + 2p

Hyperbolic

√ d d 2 +4 p 2

Elliptical



Planar

d d 2 +4 p 2

0

√d+2 p

d 2 +4 p 2 √d−2 p d 2 +4 p 2

Yi image plane

xi

Xi

proj. ray x c

Xw = T(Ω)

proj. ray x

m Z

1

X

Q

O Y

l

Zc Xc

x in the catadioptric system reference frame centered in the effective viewpoint. The transformation is linear being described by a 3 × 4 matrix P (if nothing is stated it is assumed that P = [I | 0]). To each oriented projective ray/point x, corresponds a projective ray/point xcam in a coordinate system whose origin is in the camera projection center. Notice that x and xcam must intersect in the mirror surface (see Fig. 3). If xcam = (xcam , ycam , z cam )t and K = I (camera intrinsic parameters) then point image coordinates are cam cam (xi , yi ) = ( xzcam , zycam ). In Barreto and Araujo (2001) it is proved that function fi , transforming coordinates in the projective plane x = (x, y, z)t into cartesian coordinates in the catadioptric image plane xi = (xi , yi )t , is given by Eq. (18) and that general central catadioptric image formation follows the scheme of Fig. 1. Function fi depends on mirror parameters l and m. Table 3 shows these parameters for the different situations of central catadioptric imaging. fi () : (x, y, z)

(m − l)x (m − l)y   → ,− z + l x 2 + y2 + z2 z + l x 2 + y2 + z2 (18) Figure 4 depicts an intuitive “concrete” model for the proposed general central projection mapping. To each visible point in space corresponds an oriented projective ray x joining the 3D point with the effective projection center O. The projective ray intersects a unit sphere centered in O in a unique point Q. Consider a point Oc with coordinates (0, 0, −l)t in sphere reference frame . To each x corresponds an oriented projective ray xc joining Oc with the intersection point Q in the sphere surface. Assume that the catadioptric image plane is the horizontal plane Z = m. The projective ray xc intersects the plane at xi which are the coordinates of the image point. The scene is projected into the sphere surface and then points on the sphere are re-projected into the catadioptric image plane from

Yc

Oc

Figure 4. Unit sphere “concrete” model for general central catadioptric image projection.

a novel projection center Oc . Point Oc = (0, 0, −l)t only depends on the mirror parameters (see Table 3). For a parabolic mirror l = 1 and Oc belongs to the sphere surface. The re-projection is a stereographic projection. For the hyperbolic and elliptical cases Oc is inside the sphere in the negative Z -axis. The planar mirror is a degenerate case of central catadioptric projection where l = 0 and Oc is coincident with O. Function fi becomes similar to the one given by Eq. (12) for the perspective camera situation, and the global mapping F is equal to the one studied in previous section. 2.4.2. The New World Coordinate System. The image formation process of a general central catadioptric system fits the scheme depicted in Fig. 1. The difference from the previous case of the perspective camera is that function fi is given by Eq. (18) instead of Eq. (12) where l and m depend on the mirror parameters (see Table 3). The goal of this section is to derive a coordinate transformation Γ for which the global mapping F between points in the world and in the catadioptric image is in a compact and decoupled form. Differential constraints similar to Eqs. (13) and (14) can be derived in the same manner for this more general situation. As already mentioned, integration of partial differential equations can be a complex task, leading to multiple solutions. Once again the suitable new coordinates are derived geometrically and the differential constraints are used to confirm the results. In Fig. 4 consider the vertical line in the catadioptric image plane parallel to the Y axis. All the points in the world that can be projected in this line lie in a

Selection of World Coordinate Systems in Perspective and Catadioptric Imaging Applications

conic surface, with the vertex coincident with the effective viewpoint O. The conic surface intersects the unit sphere on a circumference, passing through the projection point Q, which limits a circle containing the point Oc = (0, 0, −l)t . The axis of the conic surface is always contained in the XOZ plane. In a similar way, the world points that can be projected in an horizontal line in the image lie in a conic surface. The difference is that the axis is now in the YOZ plane. We will call the first pencil of conic surfaces (with the axis in XOZ) the vertical pencil and the second (with the axis in YOZ) the horizontal pencil. A vertical and an horizontal conic surface intersect in two oriented projection rays. This ambiguity can be solved assuming that the camera used to acquire the catadioptric images is forward looking and that only points with Z > −l are visible in the image plane. This section defines a coordinate system based in these two pencils of cones. It is shown that the resulting transformation of coordinates Γ verifies both the compactness and decoupling constraint for the general central projection situation. Consider the dotted conic surface depicted in Fig. 5. Its axis coincides with the referential X axis, the vertex coincides with the origin and the aperture angle is α. World points (X, Y, Z ) lying on the conic surface verify Eq. (19). Consider now the conic surface rotated around the Y axis by an angle φ. Equation (20) is obtained multiplying point coordinates by rotation matrix eφ yˆ (where yˆ is the unit vector of the Y axis) (Murray et al., 1994) and replacing them in Eq. (19). Equation (20) defines the rotated conic surface depicted in Fig 5 (Left). Angle α is the aperture angle and φ is the rotation angle around Y axis that can be

31

used to index the cones in the vertical pencil described above. Z 2 + Y 2 = X 2 tan(α)2

(19)

(X sin(φ) + Z cos(φ)) − (X cos(φ) 2

− Z sin(φ))2 tan(α)2 + Y 2 = 0

(20)

Observe Fig. 5 (Right) where the rotated conic surface of Eq. (20), the unit sphere and the re-projection center Oc are depicted. The lower cone must intersect the unit sphere on a circle containing the point (0, 0, −l). World scene points lying in this cone project on a vertical line at the catadioptric image plane (see Fig. 4). Figure 5 (Right) shows that the aperture angle of the conic surface must be tan(α) = ce where e2 = l 2 sin2 (φ) and c2 = 1 − e2 (an unit sphere is assumed). Equation (21) is obtained replacing this result in Eq. (20). l 2 sin(φ)2 (X 2 + Y 2 + Z 2 ) − (X cos(φ) − Z sin(φ))2 = 0

(21)

Solving Eq. (21) with respect to φ we are able to compute the vertical conic surface with the desired features that contains a certain world point (X, Y, Z ). Notice however that it is a second order equation, thus for each point in 3D space, there are two φ solutions. Each solution defines a conic surface containing the point and intersecting the unit sphere in a circle passing through its projection Q at the sphere surface. Nevertheless one of these circles contains point (0, 0, l)t , while the other contains point (0, 0, −l)t . The second solution is the one that must be used.

φ=0

α

φ Z

tan( α )=

φ

c e

e

O

X

α c

Y

l

Figure 5. Left: Vertical pencil of conic surfaces indexed by the φ rotation angle. Right: Vertical pencil of conic surfaces. Intersection of the lower and upper cone with the unit sphere.

32

Barreto and Araujo

The ψ angle is used in the same manner to represent the horizontal pencil of conic surfaces. The derivation of the relationship between ψ and point world coordinates (X, Y, Z ) is similar. At this point, given a world point represented in Cartesian coordinates (X, Y, Z ) we are able to compute a vertical and an horizontal conic surface, respectively referenced in an unique way by φ and ψ, that contain the point. In general the two cones intersect in two projective rays. However only one of them is visible in the image. The point lies in this projective ray, and a third coordinate ρ, that is the distance from the center, must be introduced. Equation (22) yields the derived change of coordinates Γ. Both φ and ψ are in the range [−π/2, π/2].

 X    φ = arctan    Z + l X 2 + Y 2 + Z 2)  

Y ψ = arctan − √    Z + l X2 + Y 2 + Z2      ρ = X2 + Y 2 + Z2

(22)

Table 4 summarizes the results obtained so far for general central projection systems. Γ is the derived world coordinate transformation that maps cartesian coordinates Xw in the new coordinates Ω associated with the conic surfaces reference frame. T is the inverse function of Γ. Replacing T in Eq. (2) (fi , P and fh were already defined) yields the mathematical expression of global mapping F using the new coordinates. The correTable 4. Using a new coordinate system for general central catadioptric imaging.     √ X arctan Z +l X 2 +Y 2 +Z 2      Y  √ Γ(Xw ) =  arctan −   2 2 2 Z +l X +Y +Z   √ X2 + Y 2 + Z2 √   l+ 1+(1−l 2 )(tan(φ)2 +tan(ψ)2 ) ρ tan(φ) 2 2 1+tan(φ) +tan(ψ)     √   l+ 1+(1−l 2 )(tan(φ)2 +tan(ψ)2 )   − ρ tan(ψ) T(Ω) =  1+tan(φ)2 +tan(ψ)2   √   1+(1−l 2 )(tan(φ)2 +tan(ψ)2 )−l(tan(φ)2 +tan(ψ)2 )  ρ 1+tan(φ)2 +tan(ψ)2  F(Ω) = (m − l)

tan(φ)

tan(ψ) 

JF (Ω) = (m − l) 



1 cos(φ)2

0

0

1 cos(ψ)2

 0  0

sponding jacobian matrix is JF . Accordingly Eqs. (10) and (11), the derived coordinate transformation Γ verify both the compactness and decoupling constraints. Notice that if l = 0 then we have the perspective camera situation studied previously. The aperture of the reference conic surfaces is always π/2 and the change of coordinates 22 becomes equal to 15. The pencils of conic surfaces degenerate in the pencils of planes derived above. The coordinate transformation proposed for the perspective camera situation is a particular case of this more general solution. 2.4.3. Applications. Applying the derived coordinate transformation Γ does not modify the image formation process nor introduces new information in the problem. Instead the geometric transformations are represented in a way that enables newer insights in the imaging process and is more suitable to develop certain applications. This section shows the advantages in expressing the global mapping function in terms of an adequate system of world coordinates. Using the derived world coordinates Ω, the global mapping F is written in a compact and decoupled form (see Table 4). Knowing both target image position xi and velocity x˙ i the recovery of 3D parameters of motion is straightforward. As a result φ = arctan(xi ) and ψ = arctan(yi ). Replacing these parameters in the jacobian matrix of Table 4 yields the jacobian matrix of the global mapping F written in terms of image coordinates xi2 −1 x˙ i (Eq. (23)). It follows that φ˙ = (1 + (m−l) and 2) m−l 2 y ˙ y −1 i i ψ˙ = −(1 + (m−l)2 ) m−l . Similarly to the perspective camera case, catadioptric images do not provide any information about the third parameter of motion ρ.   xi2 0 0 1 + (m−l) 2  (23) JF (xi ) = (m − l)  yi2 0 1 + (m−l) 0 2 It is a well known result that it is always possible to generate a geometrically correct perspective image from an image acquired by a central projection catadioptric system (Baker and Nayar, 1998). The first image of Fig. 6 was acquired by an omnidirectional system that combines a parabolic mirror with an orthographic camera. A generic scene point with cartesian coordinates Xw is projected at position xi in the catadioptric image plane (see Fig. 4). The field of view is nearly 180, thus all points in the world such that Z > 0 are visible. Assume a conventional perspective camera with projection center at the effective viewpoint 0 and optical axis aligned with the Z -axis. A generic world point with

Selection of World Coordinate Systems in Perspective and Catadioptric Imaging Applications

Figure 6.

33

Generating a geometrically correct perspective image.

coordinates Xw = (X, Y, Z )t is projected in the perspective image at position xp = ( XZ , YZ )t . The goal is to derive the function fp that maps points in the catadioptric image plane into points in the perspective image plane (xp = fp (xi )). This mapping function can be determined in a straightforward manner using the results of Table 4. As already mentioned φ = arctan(xi ) and ψ = arctan(yi ). Replacing φ and ψ in the coordinate transformation T we are able to compute world coordinates Xw as a function of catadioptric image coordinates xi and ρ. Making xp = ( XZ , YZ )t the dependence on ρ disappears and xp is obtained as a function of xi . Function fp is presented in Eq. (24) for the parabolic system case (l = 1). The geometrically correct perspective image generated from the derived mapping function is presented at Fig. 6. f p () : (xi , yi )

4 pxi 4 pyi  2    → , − (24) 4 p 2 − xi + yi2 4 p 2 − xi2 + yi2 Spherical coordinates are broadly used in computer vision and robotics. For some applications it can be useful to reference world points using spherical coordinates. Assume, as an example, that we intend to use our omnidirectional camera in a surveillance application to obtain position information of a set of targets. The imaging system is fixed at the ceiling as depicted in Fig. 7. The target position can be referenced in a simple way using spherical coordinates Ωs = (φs , ψs , ρs )t . The goal is to derive the function fs which transforms catadioptric image coordinates xi , in the two world spherical coordinates (φs , ψs ) that can be fully recovered ((φs , ψs ) = fs (xi )) without further restrictions. Xw = (X, Y, Z )t can be computed

Figure 7. imaging.

Surveillance application using central panoramic

in terms of xi and ρ. Since φ = arctan(xi ) and ψ = arctan(yi ), then Xw = T(arctan(xi ), arctan(yi ), ρ) (see Table 4). Replacing (X, Y, Z ) in φs = arctan( XZ ) and ψs = arctan( √ X 2Y+Z 2 ) by the obtained result, the dependence on ρ is eliminated and function fs is obtained (Eq. (25)). In addition, if the height h of the room is known and ifthe imaged point is on the floor one obtains ρs = h (4 p 2 + xi2 + yi2 )2 /(4 p 2 − xi2 − yi2 )2 .



4 pxi  2  , 4 p 2 − xi + yi2   −4 py i  (25) arctan   2 2 2 2 2 2 4 p + xi − yi − 4xi yi

fs () : (xi , yi ) →

3.

arctan

Imaging Systems with Motion

Section 2 focuses on static central projection imaging systems. The global mapping F from points in the 3D

34

Barreto and Araujo

world in points in the 2D image plane is derived. It is shown that the mathematical expression of F depends on the coordinate system used to reference points in the scene and that an adequate choice of coordinates presents several advantages. In this situation the imaging system does not move, thus motion in the image plane only depends on 3D motion in the scene. This section focuses on active tracking of moving targets where the goal is to use visual information to control camera motion such that target image is kept constant. The mathematical framework derived in Section 2 is extended to central projection imaging systems with rigid motion. Global mapping between 3D world points and 2D image points is derived. Criteria to select adequate world coordinates are discussed. The results obtained are applied to active visual tracking using three different platforms and/or imaging sensors. As mentioned in Section 1 the platforms considered have less than 3 d.o.f. To control such a constrained motion it is enough to work with 3D target position and velocity. If nothing is stated it is assumed that Xw is a vector with target mass center position in cartesian coordinates, Ω represents the target position in the established alternative system of coordinates and xi is the vector with the target image coordinates. Vector Θ is introduced to represent camera/platform position in an inertial world reference frame. 3.1.

Mapping Points from the 3D World into the 2D Image Plane

Consider the schematic of Fig. 1 which depicts the general mapping F performed by a static central projection imaging system. Consider P = R[I | −C], with R the rotation matrix between camera and world and C the projection center coordinates. Matrix P depends on the pose of the imaging system in world coordinates. If the imaging system is static then the matrix P is constant. On the other hand if the camera moves, its pose changes, and matrix P is no longer constant. Assume the imaging system is mounted on a moving platform. The pose of the camera depends on the position Θ of the platform. Since matrix P depends on the pose of the camera then P is a function of Θ. The mapping between the scene and the image plane depends on the target 3D coordinates Ω and on the camera pose parameterized by Θ. The target image coordinates xi are given by Eq. (26) where transformation F can be written as the composition of Eq. (27). The difference between Eqs. (2) and (27) is that matrix P is

no longer constant and appears as a function of Θ. xi = F(Ω, Θ)

(26)

F(Ω, Θ) = fi (P(Θ) · fh (T(Ω)))

(27)

The target velocity in the image xi is computed in (28). Equation (28) is obtained by differentiating Θ Eq. (26) with respect to time. JF = [JΩ F | JF ] is the ˙ is the target 3D vejacobian matrix of function F, Ω ˙ represents the camera/platform velocity. locity, and Θ JΩ F is given in (29) with JΓ the jacobian matrix of the inverse coordinate transformation Γ. JΘ F is computed in (30) and does not depend on the jacobian matrix of the world coordinate transformation JΓ . The image veloc˙ and on the camity x˙ i depends both on target velocity Ω ˙ era/platform velocity Θ. The second term in Eq. (28) is known in the literature by egomotion and represents the image motion induced by camera/platform motion. Θ ˙ ˙ x˙ i = JΩ F Ω + JF Θ

JΩ F JΘ F

= J fi · = J fi ·

h JX P · JΘ P.

Jf h ·

(28) J−1 Γ

(29) (30)

The target image depends on the relative position between camera and target. Describing target motion in a coordinate frame attached to the imaging device simplifies the position/velocity relationships of Eqs. (26) to (30). The egomotion term disappears and the position/velocity in the image depends only on the position/velocity of the target in the camera system of coordinates. The platform position, provided, for example, by encoder readings, and joint control commands are usually defined in an inertial base coordinate frame. If the control input is defined in the camera coordinate system then the transformations between the two reference systems can not be avoided. In the position/velocity relationships of Eqs. (26) to (30), both camera and target motion are described in a common inertial system of coordinates. Errors in the image plane can be directly related with the control inputs commands in the task space, thus there is no need of additional coordinate transformations. Multiple cameras can be integrated in a natural way by describing the target motion in a common coordinate frame. The explicit computation of an egomotion term can be used for image segmentation (Sharkey et al., 1993; Batista et al., 1997; Barreto et al., 1999).

Selection of World Coordinate Systems in Perspective and Catadioptric Imaging Applications

3.2.

Criteria to Select the World Coordinate System

The mathematical expression of the global mapping F depends on system of coordinates used to reference target position in the scene (Eq. (27)). The intrinsic nature of the mapping does not depend on the selected coordinate frame. However, as seen in Section 2 for static central catadioptric imaging systems, an adequate choice of the coordinate system can be highly advantageous. Criteria to select suitable coordinate transformations Γ for tracking applications are discussed in the present section. 3.2.1. Analytical Solution for Visual Control of Motion in Tracking Applications. Consider the imaging system mounted on a moving platform. If nothing is stated assume a platform with 2 d.o.f (Θ is a 2 × 1 vector). To perform active tracking of a moving target, the platform motion must be controlled such that the target projection is kept in constant position/velocity in the image plane. Assume that xd and x˙ d are the desired target position and velocity in the image. Consider that ˙ c are the position and velocity commands Θc and Θ sent to the platform actuators which have an unitary ˙ c must verify both transfer function. Thus, Θc and Θ Ω ˙ ˙ xd = F(Ω, Θc ) and x˙ d = JF · Ω + JΘ F · Θc with Ω ˙ and Ω target 3D position and velocity (see Eqs. (26) to (30)). If nothing is stated assume in the sequel that at each frame time instant both target position xi and velocity x˙ i are measured in the image plane and that the plat˙ are estimated using the form position Θ and velocity Θ encoder readings. The goal is to determine the position ˙ c knowing xi , x˙ i , Θ and velocity commands Θc and Θ ˙ and Θ. Proposition 1. Assume that it is possible to compute target 3D position Ω from target position in image xi and camera pose Θ. In a similar way consider that camera position Θ can be uniquely calculated given target position in image xi and in the scene Ω. If these two conditions hold then, given target position in image xi and camera pose Θ, it is possible to compute camera position Θc such that target is projected in a pre-defined position xd in the image plane. A similar statement can be made for velocity relationships. Proof of Proposition 1: Function F computes target position in image xi given target 3D position Ω and camera pose Θ (Eq. (26)). Assume that it exists a function ω which enables the computation of Ω given both

35

xi and Θ (Eq. (31)). In a similar way, using function θ, Θ can be calculated knowing both xi and Ω (Eq. (32)). Ω = ω(xi ,Θ)

(31)

Θ = θ(xi ,Ω)

(32)

Assume that the target image position xi is measured and camera pose Θ is estimated from encoder readings. The world target 3D position is ω(xi ,Θ). The camera position Θc such that target image is projected in position xd is given in Eq. (33). Θc = θ(xd ,ω(xi ,Θ))

(33)

Equations (34) and (35) are obtained differentiating Eqs. (31) and (32) with respect to time. Jω = [Jiω | JΘ ω] and Jθ = [Jiθ | JΘ ] are the jacobian matrices of funcθ tions ω and θ. ˙ = Jiω x˙ i + JΘ ˙ Ω ωΘ ˙ = Ji x˙ i + JΩ Ω ˙ Θ θ

θ

(34) (35)

Knowing both the target velocity in the image x˙ i and ˙ at the frame acquisition time the camera velocity Θ instant, it is possible to compute the camera velocity ˙ c such that target is projected in the image plane with Θ desired velocity x˙ d . This results is shown in Eq. (36) derived by differentiating (33) with respect to time. ˙ c = Ji x˙ d + JΩ Jiω x˙ i + JΩ JΘ ˙ Θ θ θ θ ωΘ

(36)

Proposition 2. Consider the global mapping function F and the corresponding jacobian matrix JF = Θ [JΩ F | JF ]. If it exists a function ω that enables the computation of the target 3D position from target image and camera pose, then JΩ F must be invertible. In a similar way if it exists a function θ that allows the calculation of the camera position from the target position in the image and in the world, then matrix JΘ F must have inverse Proof of Proposition 2: Assume that the function ω exists (Eq. (31)). The mathematical relationship of ˙ in Eq. (28) by the Eq. (37) is obtained replacing Ω result of Eq. (34). Equation (38) is derived replacing in (34) x˙ i by the result presented in (28).   Θ ˙ i Θ x˙ i = JΩ ˙ i + JΩ F Jω x F Jω + JF Θ   ˙ = Jiω JΩ Ω ˙ ˙ + Jiω JΘ + Jθω Θ Ω F

F

(37) (38)

36

Barreto and Araujo

If function ω exists then equalities (37) and (38) must Ω i hold. From these equalities come Jiω JΩ F = JF Jω = I. Ω i Thus matrix JF must have inverse which is Jω . ConsidΘ Θ i Θ Θ ering that JΩ F Jω + JF = Jω JF + Jω = 0 and making Ω −1 i Jω = (JF ) comes the results of Eq. (39). 

 −1 Jiω = JΩ F  Ω −1 Θ Θ Jω = − JF JF

(39)

Consider now that function θ exists. In a similar way i it can be proved that JΘ F must be invertible and that Jθ Ω and Jθ are given by Eq. (40). 

 −1 Jiθ = JΘ F  −1 Ω Ω Jθ = − JΘ JF F

(40)

Equation (36) is rewritten in Eq. (41) using the results of Eqs. (39) and (40)   ˙c=Θ ˙ + JΘ −1 (˙xd − x˙ i ) Θ F

(41)

3.2.2. Coordinate Transformation Constraints. In the previous section Proposition 2 establishes the necessary conditions for the existence of functions ω and θ and Proposition 1 establishes the sufficient conditions for the existence of an analytical solution for the tracking problem. Assuming a platform with 2 d.o.f, then JΘ F is a 2 × 2 matrix. This matrix is invertible or not depending on the features of the system, in particular the kinematics of the robotic platform and the type of image sensor. The coordinate frame where the platform position Θ is defined is considered as a problem specification that can not be changed. In general, the world target position is referenced by Ω = (φ, ψ, ρ)t , which is a 3 × 1 vector. The corresponding jacobian matrix JΩ F is a 2 × 3 matrix similar to the one shown in Eq. (9). A 2 × 3 matrix is never invertible and, accordingly to Proposition 2, it is not possible to define a function ω to recover target 3D position information. As mentioned in Section 2, the image formation process transforms points in 3 into points in 2 . There is a loss of information and world target position can not be fully recovered from a single image. Nevertheless, as mentioned in 2.2, if F is written in a compact form then Ω can be partially recovered in a straightforward manner. If the recovered target 3D parameters are enough to derive function θ

then an analytical solution for the tracking problem can be reached (Proposition 1). If it exists a transformation of coordinates Γ verifying the compactness constraint then it is possible to write the global mapping F in a compact form (Section 2.2). This means that F does not depend on ρ and that the third column of JΘ F is zero. In practice, if the compactness constraint is verified, we can replace ˆ = (φ, ψ)t in Eqs. (26) to (41) Ω = (φ, ψ, ρ)t by Ω Ω and assume that JF is a 2 × 2 matrix by discarding the null column. The verification of the compactness constraint is a necessary conditions for JΩ F being a square matrix with inverse. Considering the statement of Proposition 2, the verification of the compactness constraint is a necessary condition to exist an ω function. If both ω and θ exist then there is an analytical solution for the active tracking problem (Proposition 1). From the statements above we can conclude that it is desirable to select a transformation of coordinates Γ verifying the compactness constraint (Eq. (10)). If Γ also verifies the decoupling constraint (Eq. (11)) then F is written in a decoupled way which simplifies the calculations to obtain the tracking control laws (Eqs. (33) and (36)). Another useful guideline is to select transformation Γ such that target 3D position is referenced in a coordinate frame of the same type as the one where camera position Θ is defined. This can also lead to several simplifications in the calculation as well as a deeper understanding of the tracking task. In the sequel the derived mathematical framework is applied to active tracking of a moving target using three different robotic platforms. These examples help the understanding of the exposed ideas and illustrate the usefulness of a judicious selection of the world coordinate system. 3.3.

Active Tracking Using a Perspective Camera with Translation Motion in the XY Plane

Figure 8 depicts a conventional perspective camera with translation motion in the XOY plane. A schematic of the framework to derive global mapping F is shown in Fig. 1. The imaging sensor is a perspective camera with intrinsic parameters K = I, therefore function fi is the one shown in Eq. (12). The camera position in the XOY plane in world cartesian coordinates is Θ = (t X , tY )t . For this particular case P(Θ) is given by Eq. (42). The goal of the application is to control camera position and velocity such that target position and velocity in image are zero (xd = x˙ d = 0). Therefore

Selection of World Coordinate Systems in Perspective and Catadioptric Imaging Applications

37

on Xw = (X, Y, Z )t and Θ = (tx , t y )t . Any function Γ verifying Eq. (43) must depend, not only on Xw , but also on Θ. Γ is no longer a transformation of inertial world cartesian coordinates and we can conclude that it is not possible to find a transformation of coordinates verifying the compactness constraint. 

φX

φY

ρX

ψY ρY

 JΓ =  ψ X Figure 8. Active tracking using a perspective camera with translation motion.

the camera motion must be controlled in such a way as to keep the optical axis aligned with target mass center. 

1

 P(Θ) =  0 0

0

0 −t X

1

0

0

1



 −tY  0

(42)

3.3.1. Selecting Coordinate Transformation Γ. In a similar way to what was done in Section 2 we intend to select a suitable coordinate transformation Γ (Eq. (5)) for this specific application. Functions fi , P(Θ) and fh have been defined already. The jacobian matrix JΩ F can be written in terms of the partial derivatives of Γ by replacing JΓ in Eq. (29) by the result of Eq. (6). If we intend to select a coordinate transformation Γ verifying the compactness constraint then the third column of matrix JΩ F must be zero (Eq. (10)). This yields the partial differential equations shown in Eq. (43). 

Z (φY ψ Z − φ Z ψY ) + (X − t X )(φY ψ X − φ X ψY ) = 0 Z (φ Z ψ X − φ X ψ Z ) + (Y − tY )(φY ψ X − φ X ψY ) = 0 (43)

The system of Eqs. (43) can be solved with respect to the partial derivatives of Γ. JΓ is a 3 × 3 matrix which yields 9 unknowns for 2 equations. The problem is under determined and multiple solutions can be found. Some of the solutions obtained can be discarded by considering the additional constraint det(JΓ ) = 0 (transformation Γ must be bijective). Following this procedure the result of Eq. (44) is derived. If the coordinate transformation Γ verifies the compactness constraint then the structure of the corresponding jacobian matrix JΓ must be the one shown in Eq. (44). Notice that the provided jacobian matrix depends both

− (X −t X )φ X Z+(Y −tY )φY



 − (X −t X )ψ X Z+(Y −tY )ψY  ρZ

(44)

The conclusions drawn from the discussion above were expected. If the camera has translation motion there is no way to suppress the dependence on the third coordinate and it is not possible to perform the required motion control using only visual information. Some authors overcome the problem by assuming additional constraints such the target moving in a plane in the world (Corke, 1996; Krautgartner and Vincze, 1998). A systematic approach to determine if it exists any coordinate transformation verifying the constraints specified in 2.2 has been presented. The proposed procedure establishes necessary conditions. The fulfillment of the conditions does not guarantee the existence of a desired coordinate transformation Γ. This systematic approach will be repeated in the next two cases. 3.4.

Active Tracking Using a Perspective Camera with Pan and Tilt Rotation Motion

A perspective camera is mounted on a pan and tilt unit such that both rotation axes go through the optical center Oc . The camera first rotates in pan and then in tilt (Fick Model) as depicted in Fig. 9. The camera position X w= T( Ω) optical axis

ρ proj. ray

αt

ψ

Y αp

φ Z XOZ Plane

X

O=O c

Figure 9. Active tracking using a perspective camera mounted on a pan and tilt unit.

38

Barreto and Araujo

vector is Θ = (α p , αt )t with α p the pan angle and αt the tilt angle. Consider the global mapping scheme depicted in Fig. 1. For this particular case function fi is provided in Eq. (12) (perspective camera with K = I), Eq. (45) shows P(Θ) with e−α p yˆ c and e−αt xˆ c the pan and tilt rotation matrices (Murray et al., 1994). Unit vectors xˆ c and yˆ c are associated with the X c and Yc axes of the cartesian coordinate frame attached to the camera (not depicted in Fig. 9). The goal of the tracking application is to control camera rotation such that both target position and velocity in image is zero (xd = x˙ d = 0). To achieve this the camera optical axis must be aligned with target in 3D space. P(Θ) = e−αt xˆ c e−α p yˆ c [I | 0]

(45)

3.4.1. Selecting Coordinate Transformation Γ. Functions fi , P(Θ) and fh , as well as the corresponding jacobian matrices, are defined. Assume JΓ given by Eq. (6). According to Eq. (29), matrix JΩ F can be written in terms of the partial derivatives of coordinate transformation Γ. The procedure described in Section 3.3.1 is repeated. If Γ verifies the compactness constraint then its jacobian matrix must have the structure shown in Eq. (46). For the pan and tilt tracking situation there is a solution JΓ that only depends on Xw . Thus the existence of a cartesian coordinate transformation Γ verifying the compactness constraint is not excluded. Nevertheless, repeating the procedure to achieve a decoupled F function, allows us to conclude that it is not possible to find a transformation Γ which verifies the decoupling constraint. 

φX

 ψZ JΓ = − Y ψY +Z X ρX

− X φ X Y+Z φ Z

φZ



ψY

 ψZ 

ρY

ρZ

(46)

One of the suggested guidelines for the selection of transformation Γ is to reference the target 3D position in a coordinate frame of the same type as the one where the camera position Θ is defined. Consider the transformation from cartesian coordinates into spherical coordinates proposed in Eq. (47). The angles φ and ψ, used to reference the target 3D position, are similar to the camera pan and tilt angles (see Fig. 9). Moreover the jacobian matrix of the coordinate transformation Eq. (47) follows the structure presented in Eq. (46) which assures that the compact-

ness constraint is verified.

 X   φ = arctan   Z  

Y ψ = arctan − √    X2 + Z2     ρ = X2 + Y 2 + Z2

(47)

Table 5 summarizes the results obtained using the coordinate transformation of Eq. (47). Both Γ and its inverse T are presented. Equation (48) defines an angular error vector ∆. The goal of the tracking application is to align the camera optical axis with the target mass center. This is verified whenever ∆ is zero. Notice that by using coordinates of the same type to reference both the target and the camera position, the dependence of the global mapping F on the angular tracking errors δ p and δt is explicit. The third column of JΩ F is zero which means that Γ verifies the compactness constraint. Due to lack of space the polynomials ϒ1 , ϒ2 and ϒ3 are presented at the bottom of the table. ∆ = (δ p , δt )t = (φ − α p , ψ − αt )t

(48)

3.4.2. Active Tracking Control Law. Assume that ˙ c are the position and velocity commands Θc and Θ that must be sent to platform actuators to accomplish a specified task (the actuators transfer function is unitary). The goal is to track a moving object such that its projection is kept in the image center. The position command Θc can be obtained by making xd = 0 in Eq. (33) (Θc = θ(0, ω(xi , Θ))). Notice that the specified tracking task is accomplished by controlling the camera motion such that the optical axis becomes aligned with target mass center. Therefore the posiˆ with Ω ˆ = (φ, ψ)t . tion command must be Θc = Ω ˆ Considering that Ω = Θ + ∆ (Eq. (48)) it results Θc = Θ + ∆. The explicit computation of ω and θ functions can be avoided by determining angular error ∆ as a function of target position in image xi and camera pose Θ In Table 5 the global mapping F is written in a comˆ and Θ. The 3D pact form such that xi depends on Ω position parameters φ and ψ can be recovered from the target position in the image (xi , yi ) and the camera pose (α p , αt ) (Eq. (49)). The derivation of the angular error vector ∆ from Eq. (47) as a function of target position in the image and the camera pose is straightforward. The position command Θc is obtained by making

Selection of World Coordinate Systems in Perspective and Catadioptric Imaging Applications

39

Table 5. Using a spherical coordinate system for active tracking with a pan and tilt perspective camera.   X   arctan Z     Γ(Xw ) =   arctan − √ Y   X 2 +Z 2  √ X2 + Y 2 + Z2   ρ sin(φ) cos(ψ)   T(Ω) =  −ρ sin(ψ)   F(Ω, Θ) =    JΩ F (Ω, Θ) =   JΘ F (Ω, Θ) =

ρ cos(φ) cos(ψ)



C(ψ)S(φ−α p ) C(ψ)C(αt )C(φ−α p )+S(ψ)S(αt ) S(ψ)C(α )−C(ψ)S(α )C(φ−α )



t p − C(ψ)C(αtt )C(φ−α p )+S(ψ)S(α t)

C(ψ)(S(ψ)S(αt )C(φ−α p )+C(ψ)C(αt )) ϒ1 C(ψ)S(ψ)S(φ−α p ) ϒ1

− −



S(αt )S(φ−α p ) ϒ1

0

C(φ−α p ) ϒ1

0



C(ψ)(S(ψ)S(αt )C(φ−α p )+C(ψ)C(αt )) ϒ1

C(ψ)ϒ2 ϒ1

C(ψ)S(ψ)S(φ−α p ) ϒ1 )2 S(ψ)2 (S(φ)2 +

ϒ3 ϒ1

ϒ1 = C(ψ)2 C(αt

 

 

S(α p )2 + 2C(φ)C(α p )C(φ − α p ))

− C(αt )2 + 2S(αt )C(αt )S(ψ)C(ψ)C(φ − α p ) ϒ2 = C(ψ)S(αt )(S(α p )C(α p ) + 2C(α p )C(φ)S(φ − α p ) − S(φ)C(φ)) − S(ψ)C(αt )S(φ − α p ) ϒ3 = 1 − C(ψ)2 (C(φ)2 + C(α p )2 − 2C(α p )C(φ)C(φ − α p ))

Table 6. Position and velocity command for active tracking with a pan and tilt perspective camera.      Θc = Θ +     arctan  

arctan yi −tan(αt ) 1+yi tan(αt ) yi −tan(αt ) 1+yi tan(αt )



 xi yi S(αt )+C(αt ) 

xi2

+ tan(αt ) 1+ (yi S(αt )+C(αt ))2  tan(αt ) −



1+yi2  (1+xi2 +yi2 )(yi S(αt )+C(αt ))

˙ + ˙ c =Θ Θ 



xi (yi C(αt )−S(αt )) (1+xi2 +yi2 )(yi S(αt ) + C(αt ))

1+

xi2 (yi S(αt )+C(αt ))2

       

xi yi − (1+xi2 +yi2 )(yi S(αt )+C(αt )) 

 x˙ i



(yi S(αt )+(1+xi2 )C(αt ))  (1+xi2 +yi2 )(yi S(αt ) + C(αt ))

JΘ F can be written in terms of target position in the image and the camera pose. Equation (50) shows Jˆ Ω F Θ ˆΩ such that JΩ F = [JF | 0], and Eq. (51) gives JF . Making ˙ is obtained x˙ d = 0 in Eq. (41) the velocity command Θ (Table 6). Jˆ F     x S(α ) 1 + x 2 + y 2 yi S(αt ) + 1 + xi2 C(αt ) − √ i2 t ( i i ) 2 xi + (yi S(αt ) + C(αt ))   =  (yi S(αt ) + C(αt ))(1 + xi2 + yi2 ) (yi C(αt ) − S(αt ))xi − √2 2 xi + (yi S(αt ) + C(αt ))

Θc = Θ + ∆ (Table 6). ! "   (yi sin(αt ) + cos(αt )) sin(α p ) + xi cos(α p )   φ = arctan   (yi sin(αt ) + cos(αt )) cos(α p ) − xi sin(α p )  " !   yi cos(αt ) − sin(αt )     ψ = − arctan  2 xi + (yi sin(αt ) + cos(αt ))2 (49)

Consider the jacobian matrices of Table 5. Replacing angles φ and ψ by the result of Eq. (49), both JΩ F and

 JΘ F =



− (yi S(αt ) + 1 +

xi2

 C(αt ))

− (yi C(αt ) − S(αt ))xi

xi yi 1 + yi2



(50) (51)

If yi = − cot(αt ) then yi sin(αt ) + cos(αt ) = 0 and matrix JΘ F becomes non-invertible and a singularity oc˙ c . This curs in the derived expressions of Θc and Θ happens whenever target image lays in an horizontal line that contains the intersection point of pan rotation axis with image plane. For this case δ p = ±π/2 and tan(δ p ) = ±∞.

40

Barreto and Araujo

Figure 10. camera.

3.5.

The modular vision system (MVS). Left: The MVS robotic eye. Right: The MVS platform with two robotic eyes and a parabolic

Tracking Applications Using the MDOF and the MVS Robotic Platforms

center during tracking. The approximation errors are studied.

In our laboratory two robotic platforms have been developed for active tracking: the MDOF robot head (Batista et al., 1995) and the MVS modular platform. The MDOF head has been mainly used in monocular and binocular tracking of a single target (Batista et al., 1997; Barreto et al., 1999a, 1999b). The MVS head has been recently built with the purpose of working on simultaneous tracking of multiple targets (Barreto et al., 1999c). Figure 10 (Left) depicts one of the MVS robotic eyes. The camera has two degrees of freedom: pan and tilt. The rotation axes go through the optical center and the camera undergoes pure rotation motion. This section describes the application of the equations derived in Section 3.4 for monocular tracking. Two different behaviors must be considered: the saccadic motion and the smooth pursuit (Batista et al., 1995, 1997). The active vision system is in standby until the target appears in the field of view. Initially the object is projected somewhere in the image periphery. The goal of the saccadic motion is to position the target image in the foveal region. The performance of the saccadic control is highly dependent on the accuracy of the angular error estimation. The importance of using the exact position command of Table 6 to accomplish the task is shown. After a successful saccadic motion the object is projected nearby the image center. The smooth pursuit behavior adjusts the system position such that the target is kept in the image center. We show that for the smooth pursuit the global mapping F can be approximated by function F¯ which yields simpler mathematical expressions and decoupled control of pan and tilt d.o.f. Function F¯ is established assuming that the target is projected near the image

3.5.1. Saccadic Motion. As mentioned, the angular error ∆ = (δ p , δt )t (Eq. (48)) can be written as a function of the target position in the image and of the camera pose (Table 6). Solving the system of equations with respect to xi and yi yields the result of Eq. (52).   xi = (yi S(αt ) + C(αt )) tan(δ p ) (S(αt )2 C(δ p ) + C(αt )2 ) tan(δt ) + S(αt )C(αt )(1 − C(δ p ))   yi = − (S(αt )2 + C(αt )2 C(δ p )) + S(αt )C(αt )(1 − C(δ p )) tan(δt )

(52) The saccadic motion consists in rotating the camera such that target image jumps from the periphery to the center of the retina. Table 6 provides the exact position command Θc to accomplish the task. If command Θc is sent to pan and tilt motors, then the angular error vector ∆ becomes null. The saccadic motion is perfect because target projection moves from its initial position on the image periphery to the center (replace δ p , δt by zero in Eq. (52)). The results presented on Table 6 are novel. In Chroust et al. (2000) the equations to track an object moving in a plane with known depth are derived. The control equations proposed on Table 6 generalize this result since they assume unconstrained target motion. In Sharkey et al. (1993) and Batista et al. (1997) the angular error ∆ is approximated by ¯ = (arctan(xi ), − arctan(yi ))t . This approximation is ∆ rather intuitive and simplifies the derivation of the control law. However the performance of the saccadic control tends to be poor as shown in Fig. 11. Consider that ¯ c = Θ + ∆. ¯ After the sacthe position command is Θ cadic motion, target image moves towards the foveal region since angles δ p and δt decrease. However the

Selection of World Coordinate Systems in Perspective and Catadioptric Imaging Applications

41

error angles do not become necessarily zero, nor the target is projected in the image center. Figure 11 shows the distance to the image center after the saccadic motion for three different camera tilt positions. The X and Y axes correspond to the initial pan and tilt errors (δ p , δt ) which take values in the range [−60◦ , 60◦ ]. The Z axis is the normalized distance to the image center after the saccadic motion. To convert into image pixels the value must be multiplied k · f with k the number of pixels per metric unit and f the camera focal length (in our case k · f = 225 for both the MDOF and MVS robot head). Observing Fig. 11 comes that the performance of the saccadic motion decreases when the angular errors increase. Since the target starts by appearing in the image periphery, both δp and δt take in general high values. One concludes that the performance of the sac¯ c tends to be poor. The cadic control using command Θ ¯ approximation ∆ = (arctan(xi ), − arctan(yi ))t is only valid for small angular errors and camera tilt angles. The command equation of Table 6 must be used to control the saccadic motion. 3.5.2. Function F¯ for Smooth Pursuit. In general the active tracking process is initialized by the saccadic motion. Camera pose changes abruptly such that target projection jumps from the image periphery to the foveal region. After the saccadic motion the camera orientation is smoothly adjusted to keep the target image in the center of the retina. This stage is called the smooth pursuit control. The global mapping function F is shown in Table 5. In the smooth pursuit it is reasonable to assume that most of the time the target projection is near the image center. This assumption is used to derive an approxi¯ Equation (53) results from making mate mapping F. yi = 0 in the first equation of (52) and δ p = 0 in the second one. Function F¯ is an approximation of the global mapping F with simpler mathematical expressions and the advantage of decoupling the pan and tilt control.  ¯ F(Ω, Θ) =

Figure 11. Distance to the image center after the saccadic mo¯ with ∆ ¯ = ¯c = Θ + ∆ tion. The position command is Θ (arctan(xi ), − arctan(yi ))t . Figures correspond to camera tilt position of 23◦ (top), 0◦ (middle) and −23◦ (bottom).

cos(αt ) tan(δ p ) − tan(δt )

 (53)

Consider the approximation error vector E = xi − x¯ i (Eq. (54)) measured in the image plane. The exact im¯ age target position is xi = F(Ω, Θ), and x¯ i = F(Ω, Θ) is the approximate one. Error E depends on the camera

42

Barreto and Araujo

tilt position αt and on the angular tracking error ∆.

Position in Image for Camera Tilt Angle of 23° Pan Angular Error

Figure 12 helps to understand the approximation ¯ Each figure represents the image plane for function F. three different camera tilt positions. If αt is known then target position in the image depends on the angular tracking error ∆. Both the exact (xi ) and approximate (¯xi ) target position in the image are calculated for a set of values of ∆. The values computed of xi and x¯ i are represented in the image plane. The exact positions on the solid grid are approximated by the positions of the dashed grid. Conclusions about E = (E x , E y )t can be drawn by observing Fig. 12. Assume that the angular pan error δ p is constant along time. The target is positioned somewhere in a vertical plane in the 3D world, going through the origin O of the inertial coordinate frame. This plane is projected in a line in the image. If αt = 0, the line is vertical, if αt = 0 the line has a slope whose module is inversely proportional to module of camera tilt angle. Using the approximation of Eq. (53) the mentioned plane, containing the target,is projected in a vertical line in the image. As depicted in Fig. 12, x¯ i = xi whenever αt = 0 or y = 0. Some properties of error function E x (αt , δ p , δt ) can be observed: • E x (0, δ p , δt ) = 0 • E x (αt , δ p , δt ) = −E x (αt , −δ p , δt ) • E x (αt , δ p , δt ) = E x (−αt , δ p , −δt )

1.5 46°

0.5

0

23°



−23°

−46°

−46°

1

−23°

Tilt Angular Error

(54)



23°

−0.5

−1

46°

−1.5 −1.5

−1

−0.5

0

0.5

1

1.5

Position in Image for Camera Tilt Angle of 0° Pan Angular Error 1.5 46°

1

0.5

0

23°



−23°

−46°

−46°

−23°

Tilt Angular Error

E(αt , ∆) = (E x (αt , δ p , δt ), E y (αt , δ p , δt ))t



−0.5

23°

−1

46°

−1.5 −1.5

−1

−0.5

0

0.5

1

1.5

Position in Image for Camera Tilt Angle of 23°

• • • •

E y (αt , 0, δt ) = 0 E y (αt , δ p , −αt ) = 0 E y (αt , δ p , δt ) = E y (αt , −δ p , δt ) E y (αt , δ p , δt ) = −E y (−αt , −δ p , −δt )

A statistical characterization of the angular tracking error ∆ = (δ p , δt )t has been done experimentally.

Pan Angular Error 1.5 46°

1

0.5

0



−23°

−46°

−23°



23°

−0.5

−1

23°

−46°

Tilt Angular Error

Consider that the angular tilt error δt constant. In 3D space the target is somewhere in a cone with vertex in the origin O of the inertial coordinate frame. The image projection of these surfaces are the hyperbolic lines shown in Fig. 12. Whenever δt = −αt the conic surface degenerates in the OXZ plane. The plane projection is an horizontal line also observable in the figure. The approximation of Eq. (53) generates horizontal lines in the image. Notice that y¯ i = yi whenever δt = −αt or δ p = 0. The following properties of error function E y (αt , δ p , δt ) can be observed:

46°

−1.5 −1.5

−1

−0.5

0

0.5

1

1.5

Figure 12. Approximating the global mapping function. Exact (o) and approximate (*) target position in the image are calculated for the set of pairs of values (δ p , δt ) with δ p , δt ∈ {−46◦ , −23◦ , 0◦ , 23◦ , 46◦ }. Figures correspond to camera tilt position of 23◦ (top), 0◦ (middle) and −23◦ (bottom).

Selection of World Coordinate Systems in Perspective and Catadioptric Imaging Applications

43

Error Standard Deviation and Covariance

Average Error 0.015

0.025

0.01

0.02

0.005

0.015

Std. Dev. X

Std. Dev. Y

X axis 0

0.01

−0.005

0.005

Y axis

Covariance

−0.01

−0.015 −50

0

−40

−30

−20

−10 0 10 Camera tilt position (deg)

20

30

40

50

−0.005 −50

−40

−30

−20

−10 0 10 Camera tilt position (deg)

20

30

40

50

Figure 13. Statistical analysis of approximation error function E. Both δ p and δt are assumed to have an independent gaussian probability distribution of average 0◦ . The standard deviation of the pan and tilt angular tracking errors are 12◦ and 8◦ respectively. Left: Averages of the approximation error E x (-) and E y (- -). Right: The standard deviation of the approximation error E x (-) and E y (- -).

Assume that both δ p and δt have a gaussian probability distribution of average 0◦ (the target is almost always near the center of the image). The standard deviation of δ p is 12◦ and the standard deviation of δt is 8◦ . The pan and tilt errors are statistically independent thus the covariance is zero. The approximation errors Ex and Ey (Eq. (54)) depend on αt and ∆. Given the camera tilt position αt and the statistical characterization of the angular tracking error ∆, the statistical properties of Ex and Ey can be derived. Figure 13 shows the averages µx and µ y , the standard deviations σx and σ y and the covariance σx y as a function of camera tilt angle αt . µx and σx y are zero because E x is an odd function of δ p . µ y is an odd function of αt because E y (αt , δ p , δt ) = −E y (−αt , −δ p , −δt ). The figure presents normalized values for the statistical parameters. To convert into image pixels the value must be multiplied k · f with k the number of pixels per metric unit and f the camera focal length (in our case k · f = 225 for both the MDOF and MVS robot head). The approximation error E tends to increase with the tracking angular errors ∆. E also increases with the absolute value of the camera tilt angle. If |αt | is high then significant approximation errors arise even when the target is projected near the image center. If we intend to replace the global mapping F by function F¯ then both ∆ and αt must be small, which means that the target image must be near the center and the operating range of tilt d.o.f. can not be large.

3.5.3. Velocity Relationships. From Eq. (28) it results that the target velocity in the image is the sum x˙ i = ˙ x˙ ind + x˙ ego with x˙ ind = JΩ F Ω the velocity induced by ˙ target motion, and x˙ ego = JΘ F Θ the velocity induced by camera motion (egomotion). The first two rows of Fig. 14 depict the image velocity fields x˙ ego when the camera moves in pan and in tilt. Different camera tilt angles are used. The goal of our tracking application is to keep a zero target image velocity xi = 0. Considering that ˙ c such x˙ i = x˙ ind + x˙ ego , the camera velocity must be Θ that the egomotion compensates for the image velocity component induced by target motion in the scene ˙ c is written (˙xego = −˙xind ). The velocity command Θ in Eq. (55) as a function of the target velocity in the world (the compactness constraint is verified, therefore ρ˙ does not play any role). Notice that if ψ˙ = 0 then per˙ c = (φ, ˙ 0)t . The image fect tracking is achieved for Θ velocity induced by the target motion is compensated for by the camera pan rotation. In general the decoupling between pan and tilt control does not hold as can be seen in the last row of Fig. 14.    −1 φ˙ ˆ ˙ c = − J

J F F ψ˙   xi (yi C(αt )−S(αt ) √2   1 (yi S(αt )+C(αt )) xi +(yi S(αt )+C(αt )2  φ˙  = √2  ψ˙ xi +(yi S(αt )+C(αt ))2 0 yi S(αt )+C(αt ) (55)

44

Barreto and Araujo

40

40

40

30

30

30

20

20

20

10

0 −10

vertical field of view (deg)

50

10

0 −10

0 −10

−20

−20

−30

−30

−30

−40

−40

−40

−50 −50

−50 −50

−40

−30

−20

−10 0 10 horizontal field of view (deg)

20

30

40

50

−40

−30

−20

−10 0 10 horizontal field of view (deg)

20

30

40

−50 −50

50

40

30

30

30

20

20

20

10

0 −10

vertical field of view (deg)

50

40

10

0 −10

−20

−30

−30

−40

−40

−10 0 10 horizontal field of view (deg)

20

30

40

−50 −50

50

−30

−20

−10 0 10 horizontal field of view (deg)

20

30

40

−50 −50

50

30

30

30

20

20

20 vertical field of view (deg)

40

vertical field of view (deg)

50

40

−10

10

0 −10

−20

−30

−30

−40

−40

−10 0 10 horizontal field of view (deg)

20

30

40

50

−50 −50

−20

−10 0 10 horizontal field of view (deg)

20

30

40

50

40

50

0

−20

−20

−30

−10

−30

−30

50

10

−20

−40

40

Velocity in Image (target+camera tilt motion; camera tilt angle −23°)

50

−50 −50

−40

Velocity in Image (target+camera tilt motion; camera tilt angle 0°)

Velocity in Image (target+camera tilt motion; camera tilt angle 23°)

40

0

30

−40

−40

50

10

20

0

−20

−20

−10 0 10 horizontal field of view (deg)

−10

−30

−30

−20

10

−20

−40

−30

Velocity in Image (tilt motion; camera tilt angle −23°)

50

40

−50 −50

−40

Velocity in Image (tilt motion; camera tilt angle 0°)

50

vertical field of view (deg)

vertical field of view (deg)

10

−20

Velocity in Image (tilt motion; camera tilt angle 23°)

vertical field of view (deg)

Velocity in Image (pan motion; camera tilt angle −23°)

Velocity in Image (pan motion; camera tilt angle 0°) 50

vertical field of view (deg)

vertical field of view (deg)

Velocity in Image (pan motion; camera tilt angle 23°) 50

−40

−40

−30

−20

−10 0 10 horizontal field of view (deg)

20

30

40

50

−50 −50

−40

−30

−20

−10 0 10 horizontal field of view (deg)

20

30

˙ = (1, 0)t ; Figure 14. Image velocity field. First column αt = −23◦ , second column αt = 0◦ and third column αt = 23◦ . First row: Θ ˙ = (0, 1)t ; Ω ˙ = (0, 0)t (rad/s). Third row: Θ ˙ = (0, 1)t ; Ω ˙ = (0, 1)t (rad/s). Large rectangle (-): image with a ˙ = (0, 0)t (rad/s). Second row: Θ Ω field of view of 86◦ × 86◦ . Small rectangle (:): image with a field of view of 24◦ × 18◦ . Θ The jacobian matrices Jˆ Ω F and JF can be approxiΩ Θ mated by J¯ F and J¯ F . The result of Eq. (56) is derived assuming in Eqs. (50) and 51 that the target is projected in the center of the image (xi = 0). The approximated jacobian matrices yield simpler mathematical expressions and decoupled pan and tilt control. The egomotion term is approximated by x˙¯ ego = (cos(αt )α˙ p , α˙ t )t which is independent of the image position. As can be observed in Fig. 14 the egomotion is only constant around the image center. The approximated velocity ˙¯ is derived making x = 0 in Eq. (55). As command Θ c i ˙ ¯ ˙ ψ) ˙ t with the target world velocity apa result Θc = (φ, ¯˙ ψ) ¯˙ t = (x˙ / cos(α ), y˙ )t . As can be proximated by (φ, i t i seen in Fig. 14 the approximation yielding decoupled

pan and tilt control is only valid for the central area of the image.   ¯ Θ = cos(αt ) 0 = − J J¯ Ω F F 0 −1 3.6.

(56)

Active Tracking Using an Omnidirectional Camera with a Rotational Degree of Freedom Around the Z Axis

Figure 15 shows a picture of the MVS vision system. The robotic eyes of Fig. 10 (Left) are mounted on the tips of a rotative platform. We intend to use the omnidirectional camera to control the platform rotation.

Selection of World Coordinate Systems in Perspective and Catadioptric Imaging Applications

Rotation Axis

X w= T( Ω ) X

Yi

i

Image Plane

Oi

xi

δ=φ−α

Z

ρ

Q

X α

Y

O φ

Oc

Figure 15.

Y ψ = arctan − √   2 + Y 2 + Z2  Z + l X      ρ = X2 + Y 2 + Z2

Active tracking using an omnidirectional camera.

The catadioptric system is mounted in the center of the robotic platform (Fig. 10 (Right)). The platform rotation axis goes through the catadioptric effective viewpoint O and the re-projection center Oc . The goal of the tracking applications is to control the rotation angle α, such that the target image lays in the Y axis of the catadioptric image plane. Figure 15 represents the setup. Function fi on the global mapping scheme of Fig. 1 is provided in Eq. (18) (our catadioptric system is parabolic thus l = 1). P(Θ) is given in Eq. (57) where e−αˆz is the matrix rotation around the Z axis (Fig. 15) (Murray et al., 1994). In this case vector Θ has 1 × 1 dimension (Θ = α). In the sequel vectors Θ and Θc will be replaced by α and αc to reference the camera pose and command. The goal of the tracking application is to control the camera rotation such that the X target image position and velocity are zero. The desired image target position is xd = (0, yd ) and the velocity is x˙ d = (0, y˙ d ) with yd and y˙ d arbitrary values. P(Θ) = e−αzˆ [I | 0]

constraint. Nevertheless it is not possible to write global mapping F in a decoupled form. The proposed change of coordinates is given in Eq. (58). It is similar to the coordinate transformation used in static catadioptric imaging (Eq. (22)). The only difference is in the φ coordinate which is defined according to the camera position parameter α. The projective ray constraining the target is defined by the intersection of a vertical plane, referenced by φ, with an horizontal conic surface, indexed by ψ. The proposed coordinate system can not be used to reference points lying on the XOZ plane (Fig. 15). We assume that the target is always in front of the MVS system (Y > 0). In this case the transformation of Eq. (58) is bijective. 

X   φ = arctan −    Y  

Ground Plane

(57)

3.6.1. Selecting Coordinate Transformation Γ. The procedure explained in Sections 3.3.1 is repeated to select a suitable world coordinate frame to reference the target position. For this particular case it is possible to choose a coordinate transformation Γ verifying the compactness

45



(58)

Table 7 summarizes the results obtained. The coordinate transformation Γ and its inverse T are presented. The global mapping F is written using the new world coordinates Ω = (φ, ψ, ρ)t . The corresponding jacobian matrices are presented as well. Notice that the third column of JΩ F is zero, which means that transformation Γ verifies the compactness constraint.

Table 7. New coordinate system for active tracking with an omnidirectional camera.     arctan − YX      √ Y Γ(Xw ) =  arctan −  2 +Y 2 +Z 2 Z +l X   √ X2 + Y 2 + Z2 √   2 2 2    T(Ω) =   

tan(φ) tan(ψ)(l + 1+(1−l ) tan(ψ) (1+tan(φ) )) 1+tan(ψ)2 (1+tan(φ)2 )





(tan(ψ)(l+

1 + (1−l 2 ) tan(ψ)2 (1 + tan(φ)2 ))) 1+tan(ψ)2 (1+tan(φ)2 )



l(1−tan(ψ)2 (1 + tan(φ)2 )) + 1+(1−l 2 ) tan(ψ)2 (1+tan(φ)2 )) 1+tan(ψ)2 (1+tan(φ)2 )



F(Ω, α) = (m − l) cos(α) tan(ψ)  JΩ F (Ω, α) = (m

− l) cos(α) 



tan(φ) − tan(α) 1 + tan(φ) tan(α)



tan(ψ) cos(φ)2

tan(φ)−tan(α) cos(ψ)2

0

tan(ψ) tan(α) cos(φ)2

1 + tan(φ) tan(α) cos(ψ)2

0



JαF (Ω, α) = −(m − l) cos(α) tan(ψ)

1 + tan(φ) tan(ψ)





−(tan(φ) − tan(α))

     

46

Barreto and Araujo

3.6.2. Active Tracking Control Law. Consider the angular tracking error δ = φ −α. The result of Eq. (59) is derived in a straightforward manner and provides the angular position error δ as a function of target image coordinates.

xi δ = arctan yi

(59)

The global mapping function F is in a compact form, thus it does not depend on ρ. Solving the equation xi = F(Ω, α) with respect to Ω, both φ and ψ are obtained as a function of target position in the image and catadioptric system pose. Replacing the result in ˆΩ the two first columns of JΩ F the compact jacobian JF is obtained as a function of image coordinates and camera pose (Eq. (60)). Matrix Jα F of Eq. (62) is derived in the same way. The singularity for xyii = cot(α) is similar to the one that appears for the pan and tilt tracking situation. 



xi2 +yi2 ((m−l)2 (1+tan(α)2 ) + (yi −xi tan(α))2 )xi cos(α) y −x tan(α) (m−l)(yi −xi tan(α)) i i  Jˆ F = tan(α)(x 2 +y 2 ) ((m−l)2 (1 + tan(α) 2 ) + (yi −xi tan(α))2 )yi cos(α) i i yi −xi tan(α) (m−l)(yi −xi tan(α)) t Jα F = (−yi , x i )

(60) (61)

The X coordinate of the target position in image is zero whenever the angular error δ is zero. The position command such that xi becomes null is αc = α + δ with δ provided by Eq. (59). ˙ˆ = (φ, ˙ ψ) ˙ t. Consider the target velocity vector Ω ˙ Ω ˆ ˆ Since matrix JF is non-singular, comes that Ω = −1 (Jˆ Ω xi − J α ˙ (Eqs. (34) and (39)). Assume that F ) (˙ F α) Ω α J¯ F and J¯ α are the first rows of Jˆ Ω F F and JF . The goal is to determine the velocity command α ˙ c such that the image velocity x˙ i , along the X direction, is zero. The ˙ˆ ¯ bmα ˙ c (Eq. (28)). desired velocity is x˙ d = J¯ Ω F Ω + JF α ˙ ˆ Making x˙ d = 0 and replacing Ω comes that α˙ c = −1 ¯ Ω ˆ Ω −1 xi − J α ˙ The result of Eq. (62) −(J¯ α F ) JF (JF ) (˙ F α). ˆ Ω −1 = is achieved taking into account that J¯ Ω F (JF ) (1, 0).

 xi   αc = α + arctan yi   α˙ c = α˙ − x˙ i yi

(62)

4.

Conclusions

In this paper we show that the selection of coordinate systems is important in vision applications. In addition we introduce two constraints, the compactness constraint and the decoupling constraint. A coordinate system that verifies one or both of the constraints has several advantages in what concerns the relationships between the position and velocity variables. Coordinate systems defined according to these criteria are also presented for several applications. References Baker, S. and Nayar, S. 1998. A theory of catadioptric image formation. In Proc. of IEEE International Conference on Computer Vision, Bombay, pp. 35–42. Barreto, J.P. and Araujo, H. 2001. Issues on the geometry of central catadioptric imaging. In Proc. of the IEEE Int. Conf. on Computer Vision and Pattern Recognition, Kauai, Haway, USA. Barreto, J.P., Batista, J. and Araujo, H. 1999a. Tracking multiple targets in 3d. In IROS99—IEEE/RSJ Int. Conf. on Intelligent Robots and Systems, Kyongju, Korea, pp. 233–246. Barreto, J.P., Peixoto, P., Batista, J., and Araujo, H. 1999b. Evaluation of the robustness of visual behaviors through performance characterization. In Robust Vision for Vision-Based Control of Motion, Markus Vincze and Gregory D. Hager (Eds.), IEEE Press. Barreto, J.P., Peixoto, P., Batista, J., and Araujo, H. 1999c. Improving 3d active visual tracking. In Computer Vision Systems, Lectures Notes in Computer Science, vol. 1542, Henrik I. Christensen (Ed.), Springer. Barreto, J.P., Peixoto, P., Batista, J., and Araujo, H. 2000. Mathematical analysis for visual tracking assuming perspective projection. In SIRS2000—8th Int. Symposium on Intelligent Robotic Systems, Reading, England, pp. 233–246. Batista, J., Dias, J., Araujo, H., and Almeida, A. 1995. The isr multi-degree-of-freedom active vision robot head: Design and calibration. In Proc. of M2VIP’95—Second International Conference on Mechatronics and Machine Vision in Practice, Hong-Kong. Batista, J., Peixoto, P., and Araujo, H. 1997. Real-time visual behaviors with a binocular vision system. In ICRA97—IEEE International Conference on Robotics and Automation, New Mexico. Bogner, S. 1995. Introduction to panoramic imaging. In Proc. of IEEE SMC Conference, pp. 3100–3106. Chroust, S., Vincze, M., Traxl, R., and Krautgartner, P. 2000. Evaluation of processing architecture and control law on the performance of vision-based control systems. In AMC00–IEEE/RSJ Int. Workshop on Advanced Motion Control, Japan, pp. 19–24. Corke, P.I. 1996. Visual Control of Robots: High-Peformance Visual Servoing. Mechatronics, John Wiley. Courant, R. 1998. Differential and Integral Calculus. Wiley Classic Library. Geyer, C. and Daniilidis, K. 2000. An unifying theory for central panoramic systems and pratical implications. In Proc. of European Conference on Computer Vision, Dublin, pp. 445–461.

Selection of World Coordinate Systems in Perspective and Catadioptric Imaging Applications

Gluckman, J. and Nayar, S. 1998. Egomotion and omnidirectional cameras. In Proc. of IEEE International Conference on Computer Vision, Bombay, pp. 999–1005. Hartley, R. and Zisserman, A. 2000. Multiple View Geometry in Computer Vision. Cambridge University Press. Hecht, E. and Zajac, A. 1974. Optics. Addison-Wesley. Krautgartner, P. and Vincze, M. 1998. Performance evaluation of vision-based control tasks. In IEEE International Confernce on Robotics and Automation, Leuven, Belgium. Laveau, S. and Faugeras, O. 1996. Oriented projective geometry for computer vision. In Proc. of European Conference on Computer Vision. Malis, E., Chaumette, F., and Boudet, S. 1999. 2 1/2 d visual servoing. IEEE Trans. on Robotics and Automation, 15(2):238–250. Murray, R.M., Li, Z., and Sastry, S. 1994. A Mathematical Introduction to Robotic Manipulation. CRC Press. Nalwa, V.S. 1996. A true omnidirectional viewer. Technical Report, Bell Laboratories, Holmdel, NJ, USA.

47

Sharkey, P., Murray, D., Vandevelde, S., Reid, I., and Mclauchlan, P. 1993. A modular head/eye platform for real-time reactive vision. Mechatronics, 3(4):517–535. Stolfi, J. 1991. Oriented Projective Geometry. Academic Press Inc. Svoboda, T., Padjla, T., and Hlavac, V. 1998. Epipolar geometry for panoramic cameras. In Proc. of European Conference on Computer Vision, Freiburg, Germany, pp. 218– 332. Yagi, Y. and Kawato, S. 1990. Panoramic scene analysis with conic projection. In Proc. of the International Conference on Robots and Systems. Yamazawa, K. Yagi, Y., and Yachida, M. 1993. Omnidirectional imaging with hyperboloidal projection. In Proc. of the International Conference on Robots and Systems. Yamazawa, K. and Yagi, Y., and Yachida, M. 1995 Obstacle avoidance with omnidirectional image sensor hyperomni vision. In Proc. of the IEEE International Conference on Robotics and Automation, pp. 1062–1067.

Suggest Documents