an essential capability for any system seeking to interact with objects in its environment. ... 3) experimental validation of the max-div property and .... We limit the domain of interest to ... achieved using only the spherical divergence maximum. (b). (a) ..... 6Trial video footage (including divergence and flow estimation) available.
IEEE TRANSACTIONS PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. X, NO. Y, MONTH Z
1
A Unified Strategy for Landing and Docking using Spherical Flow Divergence Chris McCarthy, Member, IEEE, and Nick Barnes, Member, IEEE,
Abstract— We present a new visual control input from optical flow divergence enabling the design of novel, unified control laws for docking and landing. While divergence-based time-to-contact estimation is well understood, the use of divergence in visual control currently assumes knowledge of surface orientation, and/or egomotion. There exists no directly observable visual cue capable of supporting approaches to surfaces of arbitrary orientation, under general motion. Central to our measure is the use of the maximum flow field divergence on the view sphere (max-div). We prove kinematic properties governing the location of max-div, and show that max-div provides a temporal measure of proximity. From this we contribute novel control laws for regulating both approach velocity and angle of approach towards planar surfaces of arbitrary orientation, without structure-frommotion recovery. The strategy is tested in simulation, over real image sequences, and in closed-loop control of docking/landing manoeuvres on a mobile platform. Index Terms— robot vision, visuo-motor control, visual navigation, optical flow
I. I NTRODUCTION
P
ERFORMING controlled approaches towards surfaces is an essential capability for any system seeking to interact with objects in its environment. On a factory floor, docking manoeuvres are commonly required to allow a vehicle to lift, plug into, or manipulate objects. Insects such as honeybees perform smooth graze-landings on non-frontal surfaces by reducing their approach velocity in proportion to perceived surface proximity [1]. In so doing, the honeybee is virtually stationary when touch down occurs. Central to all these manoeuvres is the ability to perceive time-to-contact with the surface, and control approach velocity accordingly. For near frontal approaches, the apparent expansion (or looming) is most commonly used to achieve this, with strong biological support (e.g..[2], [3], [4], [5], [6]). Nelson and Aloimonos [7] provide the first derivation of divergence-based time-to-contact estimation for robot motor control. Traditionally, such schemes have been applied to collision detection and avoidance tasks [8], [9], [10], [11]. Only limited attention has been given to its use for surface approaches. A limitation impeding current schemes is time-tocontact can be precisely estimated only along the optical axis, or if the surface is fronto-parallel with the image plane. Thus, existing divergence-based control schemes assume or enforce either: fronto-parallel surface alignment (e.g., [12], [13]); a known non-frontal approach angle (e.g., [14]); or knowledge Manuscript submitted Dec 24, 2009. Chris McCarthy and Nick Barnes are with the NICTA Canberra Research Lab, and the Dept. of Information Engineering, Australian National University, Australian Capital Territory, Australia.
of egomotion and/or surface orientation (e.g., [15], [16]). While state-of-the-art schemes such as [15] obtain robust timeto-contact estimates for angled surfaces via closed-contours, knowledge of surface gradient and/or parallel translation in the image is still required for its use in control. In summary, there is currently no directly observable visual control input capable of supporting controlled approaches towards surfaces of arbitrary orientation, under general motion, without structurefrom-motion recovery. We present a new visual measure for control that generalises existing divergence-based control schemes to planar surfaces of arbitrary orientation, enabling the design of novel, unified control laws for landing and docking. Central to the proposed measure is the use of the divergence maximum on the view sphere (max-div). For a view sphere approaching a planar surface, the max-div point will always be located half way along the arc connecting the direction of translation, and the planar surface normal. This property offers significant advantages for both velocity and heading control during docking and landing manoeuvres. Koenderink and Van Doorn [17], [18] were the first to note this property. However, no one has considered the use of the max-div point for visual control, or provided a formal proof of this property. The novel contributions of this paper are: 1) the exposure of max-div as a directly measurable cue for unified landing/docking, including a formal proof of the max-div property; 2) visuo-motor control laws for regulating both approach speed and angle using max-div, under general motion and without structure-from-motion recovery; 3) experimental validation of the max-div property and control schemes, over synthetic and real image sequences, and in closed-loop control of docking/landing manoeuvres on a mobile platform. Robustness of max-div measurement is also demonstrated under conditions of reduced surface texture and increasing planar surface perturbances (using synthetic image sequences). The paper is organised as follows. Section II reviews background and related work in divergence and its relationship with time-to-contact estimation. Section III unifies existing results in divergence-based control under a spherical divergence framework, motivating the use of spherical projection for visual control. Section IV introduces max-div, and provides a formal proof of its kinematic properties. Section V proposes a max-div docking and landing scheme. Section VI presents both open and closed loop experiments. A discussion of results and conclusions are provided in Sections VII and VIII.
IEEE TRANSACTIONS PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. X, NO. Y, MONTH Z
TABLE I E XISTING SCHEMES FOR DOCKING , LANDING AND COLLISION
II. BACKGROUND AND RELATED WORK A. Background
AVOIDANCE .
Flow field divergence can be measured directly from the optical flow field such that: div = ux + vy ,
(1)
where ux and vy are partial derivatives of the flow components (u, v) in their respective directions. Divergence may be used to infer the time-to-contact of an approaching surface. Timeto-contact is defined as the ratio of the surface distance and its component of velocity toward the observer such that for a ˆ ∈ R3 : viewing direction p τ (ˆ p) =
R(ˆ p) , (ˆ p ¦ t)
Summing the orthogonal partial derivatives of f (ˆ p), the diˆ (referred vergence within a local tangent plane to S, about p to as divpˆ ) is obtained [20], [21], [15]: divpˆ =
2(ˆ p ¦ t) (r ¦ ts ) + , R(ˆ p) R(ˆ p)
(4)
where ∇R(ˆ p) , R(ˆ p)
ts = t − (ˆ p ¦ t)ˆ p,
Conditions of operation
References
Motion and viewing axis alignment
[7], [8], [9], [25], [10], [11], [26].
Fronto-parallel surface-camera alignment
[12], [13], [26], [22].
Known direction of surface gradient or translation in image
[15], [27], [16].
Assumed/recovered egomotion and surface orientation
[23], [24], [13], [28], [14], [29].
Arbitrary motion and surface orientation
Proposed measure (max-div).
(2)
ˆ , and where R(ˆ p) is the radial depth of the surface along p t ∈ R3 is the translational motion of the surface with respect to the observer. For convenience we refer to τ (ˆ p) as τpˆ . The relationship between div and time-to-contact is made apparent in the world frame. Let S ∈ R3 be a unit view sphere, occupying a 3D camera-centred coordinate space, moving with velocity, t ∈ R3 , and rotation ω ∈ R3 . Given motion with respect to a stationary surface, the resulting optical flow on the surface of S is given by [19]: i 1 h ˆ. f (ˆ p) = (ˆ p ¦ t)ˆ p−t −ω×p (3) R(ˆ p)
r=
2
(5)
and ∇R(ˆ p) is the surface depth gradient. Thus, divergence consists of two components; the first relating directly to τp−1 . The second component, however, relates divergence to a local shearing deformation of the flow field due to: r: the projected surface depth gradient vector, and ts : the component of translation parallel to the local tangent plane. The confounding of r and ts in this deformation implies that in general, divpˆ only defines a bound on τpˆ , complicating its use for control. However, previous work has exploited specific conditions under which divergence precisely defines τpˆ , such as when: 1) motion is along the viewing direction (i.e., |ts | = 0). e.g., [7], [8], [10]; 2) the surface is fronto-parallel with the local tangent plane (i.e., |r| = 0). e.g., [12], [22]; 3) one of surface gradient direction (i.e., ˆr) or translation direction (i.e., ˆt) is known. e.g., [15], [16]; or, 4) egomotion and surface orientation is known, or recovered explicitly. e.g., [23], [24], [13], [14].
B. Previous work Motion-camera alignment is the most common assumption employed when using divergence for visual control. Exploiting this condition, Nelson and Aloimonos [7] were the first to demonstrate the use of divergence for obstacle avoidance by guiding a camera-mounted robot arm away from areas of high divergence in the image. Ancona and Poggio [8] place 1D correlation patches symmetrically about the image origin to measure expansion. More recent applications of such schemes have been reported in [9], and Green et al. [25], however both assume translational motion only. Restrictions on motion-camera alignment are removed if the surface of interest is fronto-parallel. Questa and Sandini [26], for example, obtain divergence estimates from the radial component of motion under a log-polar mapping. This scheme, originally proposed by Tistarelli and Sandini [12], is demonstrated to support fronto-parallel docking manoeuvres. McCarthy et al. [22] propose measuring divergence at the FOE to alleviate strict assumptions on motion-camera alignment. This enables robust near-frontal surface approaches, but degrades for significantly non-frontal surfaces. Alleviating strict assumptions of surface and motion-camera alignment, Cipolla and Blake [15] exploit Green’s theorem to obtain robust divergence estimates using b-splines to track the temporal changes in the moments of area of a closed contour. The scheme is demonstrated to support general docking and surface alignment tasks, however knowledge of translation parallel to the image plane or surface orientation is still required. From a study of graze landing honeybees, Srinivasan et al. [1] propose holding the angular motion of the ground plane constant to achieve smooth graze landings. Implementations of such approaches are described in [29], [25]. However, these techniques assume the approach angle is known, not frontal, and that motion is translational only. In summary, there is currently no directly obtainable visual control input capable of supporting controlled approaches to surfaces of arbitrary orientation without assumed knowledge or continual recovery of structure-from-motion solutions in the control loop.
IEEE TRANSACTIONS PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. X, NO. Y, MONTH Z
III. S PHERICAL DIVERGENCE FOR CONTROL
(a)
Spherical projection is central to the visuo-motor control input and schemes presented in this paper. While the enabling properties of spherical projection have been previously reported [20], [7], [30], [27], their relevance to visuo-motor control has not. Here we unify existing results under a parameterised spherical framework, and show how spherical projection facilitates visuo-motor control with divergence. Continuing our notation from Section II, we define the radial distance to a point on the surface of a plane, Rn as: ˆ) = Rn (ˆ p, n
Ro , ˆ) (ˆ p¦n
ˆ . This defines a hemisphere of viewing directions about n Substituting Eq. (6) into Eq. (4), we obtain the spherical divergence on Sn : Ã ! 1 ˆ ) − (ˆ divpˆ = 3(ˆ p ¦ t)(ˆ p¦n n ¦ t) , (8) Ro
thereby parameterising divergence in all viewing directions along which Rn projects. Note that for a given viewing ˆ , the divergence is equivalent to that derived by direction, p [7]. As in Eq. (4), a precise estimate of τpˆ is available only ˆ , is known. However, if p ˆ = ˆt (i.e., the direction of if ˆt, or n translation) then Eq. (8) becomes: 2(ˆ n ¦ t) = 2τˆt−1 , Ro
(9)
ˆ=n ˆ (i.e., the surface normal), we also obtain: If p divnˆ =
2(ˆ n ¦ t) = 2τnˆ−1 , Ro
En
(b)
θmt m ˆ
ˆ t
Em
θmn n ˆ θqn
θqt q ˆ
Fig. 1. The max-div property: (a) on the view sphere; (b) framework for proof of the max-div property.
(6)
ˆ ∈ R3 gives the direction of the planar surface normal where n on the sphere (i.e., the closest surface point), and Ro ∈ R3 is the distance to this point. We limit the domain of interest to the subset of viewing directions, Sn , along which the radial surface depth is positive and finite, such that: n πo ˆ : cos−1 (ˆ ˆ) < . (7) p¦n Sn = p 2
divˆt =
3
(10)
That is, divnˆ = divˆt . These results imply two significant advantages for visual control from spherical divergence: ˆ , then divergence 1) If the surface projects along ˆt or n always provides a precise time-to-contact estimate. This ˆ, is an improvement on perspective flow, where ˆt, or n must also be aligned with the optical axis. ˆ are fixed with respect to each other (i.e., 2) if ˆt and n via control), then (from Eq. (8)) τpˆ can be continuously ˆ ∈ Sn that remains stable with measured along any p ˆ . Thus, control is not limited to observations respect to n ˆ , if (ˆ along ˆt, or n n ¦ ˆt) is maintained. ˆ and While these properties suggest continual estimates of n ˆt are required, we show that an equivalent control outcome is achieved using only the spherical divergence maximum.
IV. T HE POINT OF MAXIMUM DIVERGENCE Let θnt be the angle of approach, as defined by the angle ˆ along Em . For direction subtending the arc connecting ˆt, and n ˆ ∈ Em , let θmt and θmn be the angles subtending the arc m ˆ and n ˆ and m ˆ respectively (see Figure 1). connecting ˆt and m, Theorem 4.1: Given an approach angle θnt ∈ [0, π2 ) towards a planar surface, the maximum divergence induced by the relative motion between an infinite planar surface and view sphere will always occur halfway along the shortest arc of the ˆ. great circle connecting ˆt and n Proof: We first show that max-div must occur on the ˆ and ˆt. For any great circle, Em ∈ Sn , passing through n ˆ ∈ Sn , 6∈ Em , there exists a great circle En passing point q ˆ that is orthogonal to Em at the intersection of both through q ˆ mark the point of intersection of Em great circles. Let m and En within Sn . To prove max-div must occur on Em , we ˆ 6∈ Em , there always exists a point m ˆ ∈ show that for any q . Recalling Eq. (8), this > div Em ∩ En , such that divm ˆ ˆ q can be reduced to showing: ˆ ¦n ˆ )(m ˆ ¦ ˆt) > (ˆ ˆ )(ˆ (m q¦n q ¦ ˆt).
(11)
ˆ marks the intersection of Em and En , it must lie on As m ˆ . Thus, q ˆ is always the shortest arc connecting En to ˆt and n ˆ , than m ˆ such that: at a greater angular distance from ˆt, and n ˆ ¦ ˆt) and (ˆ ˆ ) < (m ˆ ¦n ˆ ), implying: (ˆ q ¦ ˆt) < (m q¦n ˆ ¦n ˆ) (m > 1, ˆ) (ˆ q¦n
and
(ˆ q ¦ ˆt) < 1. ˆ ¦ ˆt) (m (12)
ˆ and q ˆ will always be located within the same Note that m hemisphere with respect to ˆt, ensuring the quotient of their dot products with ˆt will always be non-negative. Thus: ˆ ¦n ˆ) (ˆ q ¦ ˆt) (m > , ˆ) (ˆ q¦n ˆ ¦ ˆt) (m
(13)
and with simple re-arrangement: ˆ ¦n ˆ )(m ˆ ¦ ˆt) > (ˆ ˆ )(ˆ (m q¦n q ¦ ˆt),
(14)
thereby proving that divm ˆ , and hence the maximum ˆ > divq divergence must occur on Em . We now prove that max-div occurs halfway along the ˆ on Em . Re-writing Eq. (8) shortest arc connecting ˆt and n
IEEE TRANSACTIONS PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. X, NO. Y, MONTH Z
4
in angular form, we obtain: ´ |t| ³ divpˆ = 3 cos(θpn ) cos(θpt ) − cos(θnt ) . (15) Ro ˆ to Em we set n ˆ to zero, and represent all other Constraining p ˆ such that: locations on Em with respect to n
φ θt θm
θn
ˆ t
θmn 7→ θm , θnt 7→ θt ,
θmt 7→ θm − θt ,
n ˆ
and re-write Eq. (15) as: ´ |t| ³ 3 cos(θm ) cos(θm − θt ) − cos(θt ) . (16) divm ˆ = Ro Maximising Eq. (16), we solve its derivative with respect to θm to obtain: 0 = =
cos(θm ) sin(θm − θt ) + sin(θm ) cos(θm − θt ), sin(2θm − θt ).
Therefore: 0
=
θm
=
2θm − θt , θt . 2
(17) (18)
It therefore follows that θmn = θ2t and θmt = − θ2t , thus proving ˆ lies halfway along the arc connecting n ˆ and ˆt. This that m concludes the proof. Substituting θm for 21 θt in Eq. (15), we obtain the following expression for max-div: ! Ã ³ ´ |t| 2 θnt − cos(θnt ) . (19) 3 cos divmax = Ro 2 Applying the identity cos(2θ) = 2 cos2 (θ) − 1, we obtain: Ã !−1 ³ ´ Ro −1 2 θnt cos +1 . (20) divmax = |t| 2
Recalling Eq (9), we define div−1 max in terms of τˆ t such that: ´ ³ −1 1 3 . (21) div−1 max = τˆ t 1+ 2 cos(θnt )
Thus, div−1 ˆ ). This new result max ∝ τˆ t (and from Eq. 10, ∝ τn for visuo-motor control allows div−1 max to be applied in place ˆ in the image. of τˆt and τnˆ without explicit knowledge of ˆt or n V. U NIFYING LANDING AND DOCKING USING MAX - DIV From the results of Theorem 4.1 and Eq. 21, we propose two novel control laws to regulate approach velocity and angle. A. Velocity control We adopt a similar scheme to [22], in which divergence is held constant at the FOE to achieve near-frontal approaches. We generalise this (via Eq. (21)) to any approach angle by replacing FOE divergence with div−1 max , such that: ³ ´ v(t) = ∆v(t − 1) + Kv divref − divmax (t) , (22)
where v(t) is the velocity control input at time t, Kv is a proportional gain, divmax (t) is the maximum divergence within the projected surface region, and divref is the divergence
Fig. 2.
Geometric model for approach angle control on view sphere
level to maintain. From Eq. (21), the relationship between the approach velocity and divref is defined as: ´ |t| ³ cos(θnt ) + 3 . (23) divref = 2Ro Note that divref may be tuned empirically without knowledge |t| , or cos(θnt ). Once tuned, the implicit scaling of divmax of R o estimates by cos(θnt ) allows divref to be safely applied across all approach angles, resulting in slower maximum speeds (for a given Ro ) for near frontal approaches (θnt ≈ 0o ) and increasing as θnt → 90o1 . B. Regulating approach angle We propose a novel strategy for maintaining a constant approach angle: θnt . Let θt , θn ∈ [0, 2π] be the angular ˆ respectively, on the great circle, Em , location of ˆt and n ˆ be the direction of the passing through both points. Let m max-div point on the great circle Em , and θm , its angular location on Em (see Figure 2). From Theorem 4.1, and assuming a stationary target surface (i.e., θn is fixed), we note that changes in θm (∆θm ) arise only due to changes in θt (∆θt ), such that: 1 (24) ∆θ . 2 t Based on this, we propose maintaining θnt by keeping θm at a constant location on Em such that: ∆θm =
θt (t) = θt (t − 1) − 2Kθ (θs − θm (t)),
(25)
where θs is the desired absolute location of max-div on the view sphere, θm (t) is the measured location of max-div at time t, and Kθ is a proportional gain. Note that the scale factor 2 reflects the relationship expressed in Eq. (24). 1) Setting the approach angle: θs : To set a specific approach angle an initial reference frame is required. This can ˆ or ˆt, and be minimally determined via an initial estimate of n the direction of the shortest arc on the view sphere connecting them. If ˆt is known, then the desired angle is defined as twice the angular distance of θs from θt in the direction of θn , with the constraint: |θt − θs | < 45o . The approach angle ˆ . Thus, frontomay be equivalently defined with respect to n parallel approaches are achieved by setting θs to θn or θt . Once set, heading control operates without need for continued egomotion or surface orientation recovery thereafter. 1 [22]
provides a proof of stability for this control framework.
IEEE TRANSACTIONS PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. X, NO. Y, MONTH Z
ˆ 2) Heading corrections and orientation: Eq. (25) fixes m to an absolute location on the view sphere, but does not ˆ are possible constrain Em . Thus, arbitrary rotations about m and will not effect the resulting approach angle correction. To fix Em , a second reference point (e.g., the ground plane direction2 ) and two correcting rotations is required (i.e., orientation and approach angle corrections). 3) Extent of planar surface: The above control laws assume only that the target surface projects along the bisecting line ˆ and ˆt, and is large enough to determine the point as between n the maximum. In practice, this size will be determined by the spatial support required to obtain robust divergence estimates (e.g., weakly textured surfaces may require larger spatial support to reduce flow estimation error). Spatial support ranged from 30o to 50o to obtain workable divergence estimates in our results (Section VI). VI. E XPERIMENTAL RESULTS We first validate max-div via open-loop testing, including under conditions of reduced surface texture and planar surface perturbance. We then present on-board closed-loop testing of the proposed control laws. A. Experimental methods 1) Sensors: All experiments employ one of two cameras: • Omni-tech Unibrain Fire-i BCL 1.2 lens (the omnitech camera) projects a 190o FOV onto a standard CCD chip. • Point Grey Research Ladybug camera (the ladybug camera) captures 512 × 512 pixel, 180o FOV images via stitched images from six rigidly mounted cameras. Calibration data maps image points from both sensors to the unit sphere, and thus to a viewing direction. 2) Synthetic image sequences: A number of synthetic image sequences were constructed for open-loop testing3 . Synthetic landing sequences (synth-landing) (Fig 3(a)) simulate camera motion towards a surface of high texture and perfect planarity. Four sequences were constructed to depict motion along uniformly distributed approach angles: 0o (i.e., frontal), 22.5o , 45o and 67.5o . Synthetic reduced-texture sequences (synth-texture) depict approaches towards texture-mapped planar surfaces of reducing spatial frequency. Texture is determined by a colour homogeneity value h ∈ [0, 1]: the proportion of surface area with zero intensity gradient (i.e., texture decreases as h → 1). Figures 3(a-b) show examples of h = 0.1 and 0.3 respectively. Synthetic bump sequences (synth-bump) depict landing trajectories towards increasingly non-planar surfaces of uniform texture (h = 0.1). 100 peak/pit perturbations were randomly created and added to the planar surface elevation map according to a variation scale factor b ∈ [0, 1], indicating the mean height variation across the surface, expressed as a proportion of the camera’s initial height above the unperturbed zeroelevation plane. Thus, elevation perturbations increase as b → 2 such
information may be available from the horizon, or gravity sensors. image sequences available from http://cecs.anu.edu.au/ ˜cdmcc/maxdiv 3 All
5
1. Perturbation width was randomly selected as a proportion of the total plane size, with an upper bound of 0.5. Figure 5 shows an example for b = 0.13. 3) Real image sequences: Controlled indoor sequences (indoor-landing) depict motion towards a carpeted planar floor. Images were captured using the omnitech camera, mounted on a motor-controlled lift platform providing constant motion for each descent. Sequences were captured for 0o , 22.5o , 45o and 67.5o approach angles by tilting the lift platform to each angle. The camera was mounted 45o from the lift platform to ensure θm was always within the view field. Unlike synthlanding, other surfaces are also present, and camera-surface alignment varies with each angle of approach. Figures 3(c-d) show examples of 0o and 67.5o . Hand-held outdoor sequences were captured using the ladybug camera. The camera was manually lowered towards the ground at approximately constant velocity and trajectory (though subject to natural variation). Images were captured for a steep descent towards a weakly textured cement surface (cement-landing), and shallower descent towards an unevenly grassed surface (grass-landing). See Figures 3(e) and (f). 4) Optical flow and divergence computation: The maxdiv scheme is implemented using divergence taken directly from optical flow gradients. Optical flow is computed in the original image using a pyramidal implementation of Lucas and Kanade’s gradient-based technique [31], as described by Bouguet [32] and provided in the OpenCV developers library4 . Figures 3(e-f) show example flow fields from this method. Divergence is computed in local tangent planes of size N × N pixels, spanning a visual field of β degrees (N = 32, β = 30o unless otherwise stated). Flow vectors within the visual field are sampled from the original image and re-projected into the corresponding tangent plane. Partial derivatives of the flow field, ux and uy , are obtained via the seperate convolution of 1D Sobel kernels (size 7) over ux and uy in the X and Y directions in the local tangent plane respectively. Partial derivatives are summed at each sample point to obtain a divergence estimate, and then averaged across the tangent plane to obtain the divergence for the viewing direction. A comparison of divergence across the view field determines the max-div value (divmax ) and location (θm ) on the view sphere5 . θm is the elevation component of the spherical coordinate of max-div. B. Open-loop results 1) Synth-landing: div−1 max and θm were computed across all synth-landing sequences. Figure 4(a) shows div−1 max responses (normalised) for the last 20 frames of each landing trajectory. For comparison, time and div−1 max have been normalised to the range [0, 1], making ground truth the straight diagonal y = 1 − x. This follows from Eq. 21, and is valid because approach angle (i.e., cos(θnt )) and speed are held constant 4 http://opencv.willowgarage.com. 5 A reduced real-time implementation for closed-loop application was also developed, computing flow for every eighth pixel only, and velocity gradients without tangent-plane mapping (i.e., from image flow directly). While a loss of precision was observed, stability remained comparable and sufficient for closed-loop trials (comparison results omitted due to space constraints).
IEEE TRANSACTIONS PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. X, NO. Y, MONTH Z (a)
(b)
(d)
(c)
approach 0o 67.5o
0.0 1.2o (8.5o ) −2.6o (7.2o )
approach 0o 67.5o
0.02 −4.5o (4.1o ) −1.8o (1.8o )
6
synth-texture h 0.3 0.5 2.7o (9.4o ) 4.1o (8.9o ) -2.6o (7.4o ) 1.45o (12.1o ) synth-bump b 0.08 0.16 −4.5o (3.5o ) −5.3o (4.5o ) 3.2o (6.9o ) −0.1o (7.3o )
0.6 34.5o (10.8o ) 10.9o (15.5o ) 0.32 – 8.4o (0.9o )
TABLE III M EAN θm POSITION ERRORS FOR synth-texture AND synth-bump SEQUENCES . S TANDARD DEVIATIONS IN BRACKETS . (e)
(f)
Fig. 3. Examples of image sequences for open-loop testing including: (a) synthetic with high texture (h = 0.1), (b) synthetic reduced texture (h = 0.3), (c) real indoor 0o landing, (d) indoor 67.5o landing, (e) outdoor grass landing, and (f) to a weekly textured cement surface; (e-f) also show flow fields and estimated max-div location (white-cross). 22.5o -0.4o (6.1o ) 6.4o (5.6o )
45o 0.8o (4.0o ) 5.3o (7.5o )
67.5o 1.05o (4.25o ) 7.4o (3.3o )
Fig. 5. Example planar surface deformation used for synth-bump sequences.
TABLE II
(a)
(b)
max-div versus time (normalised): Synthetic landing sequences
1
max-div versus time (normalised): all approach angles
1
o
0.8
0.6
0.4
0.4
0.2 0
0.8
max-div (normalised)
0 max-div (normalised)
0.6
0.2
0o 22.5o 45o 67.5 ground truth
0.8
0.0 0.04 0.10 0.16 0.25 0.32 ground truth
1
o
0o 22.5o 45o 67.5 ground truth
o
max-div signal versus planar surface deformations (synth $67.5 $) 1.2
0.01 0.02 0.04 0.08 0.16 0.2 0.25 ground truth
0.8
max-div (normalised)
(a)
(b)
o
max-div signal versus planar surface deformations (synth $0 $) 1
M EAN θm POSITION ERRORS FOR SYNTHETIC AND INDOOR LANDING SEQUENCES . S TANDARD DEVIATIONS IN BRACKETS .
max-div (normalised)
0o 1.2o (8.5o ) 0.3o (11.9o )
synth indoor
0.6
0.4
0.2
0.6
0.2
0.4 0.6 time (normalised)
0.8
1
0
0.2
0.4 0.6 time (normalised)
0.8
o o Fig. 6. div−1 max response for synth-bump (a) 0 and (b) 67.5 with increasing levels of planar surface deformation.
0.4
0.2
0
-0.2 0
0 0
0.2
0.4 0.6 time (normalised)
0.8
1
0
0.2
0.4 0.6 time (normalised)
0.8
Fig. 4. div−1 max signal plots across all approach angles for (a) synthetic image sequences, and (b) controlled indoor sequences.
across each sequence. Thus, ground truth is the normalised time-to-contact with the surface. Results show close coherence with ground truth across all approach angles, particularly as surface proximity decreases (i.e., div−1 max → 0). Table II (top row) gives accuracy details for synth-landing θm estimates, showing recorded mean position error of θm with respect to theoretical expectation (Theorem 4.1). Mean θm positions are all within ≈ 1o of predicted locations. 2) Indoor-landing: Figure 4(b) and Table II (bottom row) show results from the same experiment across indoor-landing sequences. Despite noisier conditions, varying camera-surface alignment and the presence of multiple surfaces, div−1 max exhibits similar close coherence with theoretical expectation across all landing trajectories, with t = 0.65 for 0o providing the only exception. Errors in θm estimates generally increase compared with synth-landing results. Standard deviations,
1
however, are similar suggesting the stability of θm estimates is comparable. Note that small errors in lift platform tilt and camera alignment are also present. 3) Reduced surface texture (synth-texture): Max-div was computed across a range of synth-texture sequences for both 0o and 67.5o approaches to examine the robustness of θm estimates under decreasing levels of surface texture. Table III summarises θm errors for 0o and 67.5o , for increasing h values. It can be seen that θm estimates remain relatively stable up until h = 0.5, after which a significant drop in accuracy is evident. While not shown, similar degradation was observed in corresponding div−1 max plots. 4) Surface non-planarities (synth-bump): Figure 6 plots o o div−1 max for 0 and 67.5 approaches towards increasingly perturbed planar surfaces, where div−1 max appears to remain stable for b ≤ 0.16 (with the exception of b = 0.08 results around t = 0.75). Stability was found to degrade significantly for b > 0.2. Table III reports mean θm position errors for both approach angles as b is increased. Errors remain stable, and even improve for 67.5o as b increases. However, performance consistency was generally
1
IEEE TRANSACTIONS PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. X, NO. Y, MONTH Z
(a)
Real-time max-div over synthetic and indoor-landing sequences 1
(a)
max-div signal - outdoor sequences
(b)
1
synth 0o synth 22.5o synth 45 o synth 67.5o indoor 0 o indoor 22.5o indoor 45 o indoor 67.5 ground truth
grass-landing cement-landing 0.8 divmax (normalised)
0.8
0.6
ground truth
0.6
-1
0.4
0.2
0.4
0.2
0 0
0.2
0.4 0.6 time (normalised)
0.8
1
0 0
0.2
0.4
0.6
0.8
1
time (normalised)
Fig. 7. div−1 max responses for outdoor cement- and grass-landing sequences.
observed to degrade for b > 0.2. At b = 0.32, error increases to 8.4o for 67.5o . No meaningful results were obtained for 0o . 5) Max-div under operating conditions: div−1 max and θm were examined over the cement- and grass-landing sequences using the real-time implementation. Figure 7 shows div−1 max (normalised) results from both sequences. Results show less conformity with ground truth than synthetic and indoor sequences. However, a linear trend is preserved and local temporal consistency in div−1 max is also apparent (e.g., Figure 7(b) cement for 0.5 ≤ t ≤ 0.7), suggesting variations are a likely result of genuine camera motion changes. Fluctuations in div−1 max are present but appear to be within workable limits. Quantitative assessment of θm accuracy is not possible, however approach angle estimates (grass: 75.6o , cement: 42.2o ) derived from mean θm estimates clearly distinguish the steeper cement- and shallower grass-landing descents. C. Closed-loop experiment The max-div control scheme was implemented for closedloop control of a mobile robot. The omnitech camera was attached to the front of the robot, facing the docking surface. The robot drive system provides omni-directional planar motion. Figure 8 shows the robot and experimental workspace6 . Robot motion is initially parallel to the surface (20cms−1 ), forcing heading adjustments to achieve the task. Initial distance from surface was 100cm for 67.5o trials7 , and 125cm for others. Five trials were conducted for each approach angle. Velocity control was implemented as described in Eq. (22), with an additional derivative term: ³ ´ (26) Kd divmax (t) − divmax (t − 1) ,
where Kd is a derivative control gain. Directional control was implemented using Eq. (25) directly, with heading corrections constrained to the plane of motion. Max-div parameters (i.e., Kv , Kd , and divref ) were empirically tuned along a 0o approach and applied across all approach angles. For comparison, the graze-landing model proposed in [1] was implemented; providing a benchmark for stable nonfrontal approaches using a similar control scheme. Estimates of τ are obtained from the average horizontal flow magnitude within a single 2 × 10 vector horizontal strip about the image centre. Velocity is controlled using Eq. (22) (with −1 derivative term), replacing divref and divmax (t) with τref and 6 Trial video footage (including divergence and flow estimation) available at http://cecs.anu.edu.au/˜cdmcc/maxdiv 7 due to physical constraints of workspace.
Fig. 8. (a) Side view of on-board experimental workspace and robot and (b) a view from the onboard camera. The blue and white crosses indicate the reference and estimated max-div location respectively. (a)
(b)
Overhead Tracking - all approach angles
320 0 deg appr 22 deg appr 45 deg appr 67 deg appr
300
280
260
240 y
max-div (normalised)
(b) o
7
220
200
180
160 0
50
100
150
200
250
x
Fig. 9. Overhead tracking plots for: (a) sample 0o and 67.5o approaches in the overhead images; and (b) across all max-div trials.
Avg approach angle Avg stop dist
0o −4.8o 11.7cm
22.5o 23.4o 12.4cm
45o 30.3o 14.0cm
67.5o 53.3o 14.6cm
TABLE IV P ERFORMANCE STATISTICS FOR CLOSED - LOOP ROBOT DOCKING / LANDING TRIALS OF MAX - DIV SCHEME .
the measured time-to-contact, τ −1 (t). Note that this scheme requires control tuning for every approach angle. Figure 9 shows plotted paths taken across all max-div trials. Position is given as the tracked robot centre in calibrated overhead images. Approach angles are clearly distinguished, with consistent paths taken across each trial set. Table IV shows the average angle of approach and stopping distances recorded (measured from the docking surface). Stopping distances grow marginally with increasing approach angles, but are contained within a 3cm range. Approach angle error increases similarly, but remains well bounded. Figure 10 shows velocity-time profiles across 67.5o and 0o trials for both max-div and the graze-landing model. Maxdiv achieves the expected exponential velocity decay for both approaches (as reported in [1] and replicated in 67.5o graze-landing results). As expected, the graze-landing model breaks down for near frontal approaches, with 10o being the experimentally determined fail case 8 . As noted, control for the graze-landing model required separate tuning for each angle of approach, while for max-div, tuning was required only once. VII. D ISCUSSION Results validate the max-div property and provide strong support for the viability of max-div as an input for visuo-motor 8 stability
was found to significantly degrade from 22.5o downward.
IEEE TRANSACTIONS PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. X, NO. Y, MONTH Z Max−divergence model
(a)
Velocity/Time Profiles - 67.5 deg approach 40000 trial 1 trial 2 trial 3 trial 4 trial 5
35000
trial 1 trial 2 trial 3 trial 4 trial 5
35000
30000 velocity (encoder clicks)
30000 velocity (encoder clicks)
Graze−landing model
(b)
Velocity/Time Profiles - 67.5 deg approach
40000
25000
20000
15000
25000
20000
15000
10000
10000
5000
5000
0
0 5
10
15
20
25
30
5
10
time
15
20
25
time
Velocity/Time Profiles - 10 deg approach
Velocity/Time Profiles - 0 deg approach 50000 45000
trial 1 trial 2 trial 3 trial 4 trial 5
45000
trial 1 trial 2 trial 3 trial 4 trial 5
40000
40000 35000 velocity (encoder clicks)
velocity (encoder clicks)
35000 30000 25000 20000
30000 25000 20000 15000
15000 10000
10000
5000
5000
0 2
4
6
8
10
12
14
16
8
0
2
4
6
8
10
12
time
time
Fig. 10. Velocity-time profiles recorded during all on-board 67.5o and 0o trials for: (a) the max-div model, (b) the graze-landing model. No meaningful results were obtained with the graze-landing scheme for 0o . The observed fail case, 10o , is shown instead.
control. Open-loop results demonstrate the stable estimation of surface proximity from max-div across a complete range of approach angles. Closed-loop results confirm this, providing stable docking performance across all approach angles. Synth-texture and -bump results indicate low sensitivity to reduced surface texture and perturbed planar surfaces, with further support from hand-held outdoor cement- and grasslanding results. Note, however, that the implementation is not prescribed by the proposed control scheme, and more robust methods such as closed-contours (e.g., [8], [15], [27]) may be adapted where appropriate (e.g., where surface texture is sparse). Max-div stability was also observed to improve as the surface drew closer, providing a natural advantage for docking/landing control. VIII. C ONCLUSION We have presented a novel visuo-motor control input, the point of maximum divergence (max-div), enabling the design of new control laws for performing controlled approaches to surfaces of arbitrary orientation. The use of max-div removes restrictions on camera motion without egomotion recovery, providing a general and unified solution for visuo-motor docking/landing. We have presented the first formal proof of the kinematic properties governing max-div, and a set of open and closed-loop experiments, including synthetic and real image sequences (indoor and outdoor), demonstrating the viability of max-div and the proposed control laws. R EFERENCES [1] M. V. Srinivasan, S. W. Zhang, J. S. Chahl, E. Barth, and S. Venkatesh, “How honeybees make grazing landings on flat surfaces,” Biological Cybernetics, vol. 83, pp. 171–83, 2000. [2] H. Wagner, “Flow-field variables trigger landing in flies,” Nature, vol. 297, pp. 147–148, 1982. [3] D. N. Lee, M. N. O. Davies, P. R. Green, and F. R. van der Weel, “Visual control of velocity of approach by pigeon when landing,” Journal of Experimental Biology, vol. 180, pp. 85–104, 1993. [4] D. N. Lee, “A theory of visual control of braking based on information about time to collision,” Perception, vol. 5, no. 4, pp. 437–59, 1976.
[5] F. C. Rind, “Collision avoidance: from the locust eye to a seeing machine,” in From Living Eyes to Seeing Machines, M. V. Srinivasan and S. Venkatesh, Eds., 1997, pp. 105–125. [6] R. M. Robertson and A. G. Johnson, “Collision avoidance of flying locusts: steering torques and behaviour,” Journal of Experimental Biology, vol. 183, pp. 35–60, 1993. [7] R. C. Nelson and J. Y. Alloimonos, “Obstacle avoidance using flow field divergence,” IEEE Trans. Pattern Anal. Machine Intell., vol. 11, no. 10, pp. 1102–6, 1989. [8] N. Ancona and T. Poggio, “Optical flow from 1d correlation: Application to a simple time-to-crash detector,” in ICCV, 1993, pp. 209–14. [9] J.-C. Zufferey and D. Floreano, “Fly-inspired visual steering of an ultralight indoor aircraft,” IEEE Trans. Robot., vol. 22, no. 1, pp. 137– 146, 2006. [10] D. Coombs, M. Herman, T. Hong, and M. Nashman, “Real-time obstacle avoidance using central flow divergence, and peripheral flow,” IEEE Trans. Robot. Automat., vol. 14, no. 1, pp. 49–59, 1998. [11] S. Berm`udez, P. Pyk, and P. Verschure, “A fly-locust based neuronal control system applied to an unmanned aerial vehicle: the invertebrate neuronal principles for course stabilization, altitude control and collision avoidance,” Int. J. Robot. Res., vol. 26, no. 7, pp. 759–772, 2007. [12] M. Tistarelli and G. Sandini, “On the advantages of polar and log-polar mapping for direct estimation of time-to-impact from optical flow,” IEEE Trans. Pattern Anal. Machine Intell., vol. 15, no. 4, pp. 401–10, 1993. [13] J. Santos-Victor and G. Sandini, “Visual behaviors for docking,” CVIU, vol. 67, no. 3, pp. 223–38, 1997. [14] J. S. Chahl, M. V. Srinivasan, and S. W. Zhang, “Landing strategies in honeybees and applications to uninhabited airborne vehicles,” Int. J. Robot. Res., vol. 23, no. 2, pp. 101–110, 2004. [15] R. Cipolla and A. Blake, “Image divergence and deformation from closed curves,” Int. J. Robot. Res., vol. 16, no. 1, pp. 77–96, 1997. [16] P. Questa, E. Grossmann, and G. Sandini, “Camera self orientation and docking maneuver using normal flow,” in Proceedings of Spie, vol. 2488, 1995, pp. 274–83. [17] J. J. Koenderink and A. J. van Doorn, “Local structure of movement parallax of the plane,” J. Opt. Soc. A., vol. 66, no. 7, pp. 717–723, 1976. [18] ——, “Exterospecific component of the motion parallax field,” J. Opt. Soc. A., vol. 71, no. 8, pp. 953–957, 1981. [19] T. Brodsky, C. Ferm¨uller, and Y. Aloimonos, “Direction of motion fields are hardly ever ambiguous,” IJCV, vol. 26, no. 1, pp. 5–24, 1998. [20] J. Koenderink and A. V. Doorn, “Invariant properties of the motion parallax field due to the movement of rigid bodies relative to an observer,” Optica Acta, vol. 22, no. 9, pp. 773–791, 1975. [21] M. Subbarao, “Bounds on time-to-collision and rotational component from first-order derivatives of image flow,” in Comp. Vis. Graph. Img. Proc., vol. 50, 1990, pp. 329 – 41. [22] C. McCarthy, N. Barnes, and R. Mahony, “A robust docking strategy for a mobile robot using flow field divergence,” IEEE Trans. Robot., vol. 24, no. 4, pp. 832–842, 2008. [23] F. G. Meyer, “Time-to-collision from first-order models of the motion field,” IEEE Trans. Robot. Automat., vol. 10, no. 6, pp. 792–8, 1994. [24] J. Santos-Victor and G. Sandini, “Uncalibrated obstacle detection using normal flow,” Mach. Vis. App., vol. 9, no. 3, pp. 130–37, 1996. [25] W. Green, P. Oh, and G. Barrows, “Flying insect inspired vision for autonomous aerial robot maneuvers in near-earth environments,” in ICRA, vol. 3, 2004, pp. 2347–2352. [26] P. Questa and G. Sandini, “Time to contact computation with a spacevariant retina-like c-mos sensor,” in IROS, vol. 3, 1996, pp. 1622–1629. [27] Z. Duric, A. Rosenfeld, and J. Duncan, “The applicability of green’s theorem to computation of rate of approach,” IJCV, vol. 31, no. 1, pp. 83–98, 1999. [28] M. I. A. Lourakis and S. C. Orphanoudakis, “Using planar parallax to estimate the time-to-contact,” in CVPR, vol. 2, 1999, pp. 640–645. [29] F. Ruffier and N. Franceschini, “Optic flow regulation: the key to aircraft automatic guidance,” Rob. Auto. Sys., vol. 50, pp. 177–194, 2005. [30] C. Colombo and A. Del Bimbo, “Generalized bounds for time to collision from first-order image motion,” in ICCV, vol. 1, 1999, pp. 220–226. [31] B. Lucas and T. Kanade, “An iterative image registration technique with an application to stereo vision.” in Proceedings of DARPA Image Understanding Workshop, 1981, pp. 121–130. [32] J.-Y. Bouguet, “Pyramidal implementation of the Lucas Kanade feature tracker description of the algorithm,” in OpenCV Documentation. Intel Corporation, 2000.