Body-relative Navigation Using Uncalibrated Cameras

Body-relative Navigation Using Uncalibrated Cameras Olivier Oli i K Koch h [email protected]

Seth S hT Teller ll [email protected]

Massachusetts Institute of Technology Computer Science and Artificial Intelligence Laboratory

Motivation Navigation guidance to humans – Finding our way in complex/new environments

Whyy is it hard? – No external source of localization (GPS) – Unknown environment (no map)

Why should you care? – Soldiers in the field – Visually impaired – Guidance in public places (hospitals, museums)

3/17/2009

Olivier Koch

2

Vision-based navigation

Four Pointgrey Firefly MV Cameras (640x480 8‐bit grayscale images) FOV: 360° (h) x 90° (v)

Why vision? 9 Light, inexpensive, compact c information o at o ((vss laser ase rangerfinders) a ge de s) 9 Rich 9 No temporal drift (vs inertial sensors) 3/17/2009

Olivier Koch

3

Uncalibrated cameras

Four Pointgrey Firefly MV Cameras (640x480 8‐bit grayscale images) FOV: 360° (h) x 90° (v)

Why use uncalibrated cameras? ¾ Intrinsic calibration is tedious t s c calibration ca b at o is s hard a d for o body body-worn o app applications cat o s ¾ Extrinsic 3/17/2009

Olivier Koch

4

Problem statement Input Live video stream from wearable set of uncalibrated cameras

start end

3/17/2009

Olivier Koch

5

Problem statement Input

Output

Live video stream from wearable set of uncalibrated cameras

Body-relative guidance for: Homing (going back to start point) Replay (from start point to end point) Point-to-point navigation

start end

3/17/2009

Olivier Koch

6

Sample dataset

3/17/2009

Olivier Koch

7

Method overview Exploration path: undirected graph (place graph) Node: physical location in the world p y Edge: physical path between two nodes traversed by the user 9 Makes no assumption on user motion between nodes Makes no assumption on user motion between nodes

3/17/2009

Olivier Koch

8

Method overview Local node orientation: direction of the user leaving the node ¾ Assume smooth user motion Local node observations (visual features) ¾ Assume distinctive feature visibility 9 No global coordinate frame

3/17/2009

Olivier Koch

9

Method overview Node‐to‐node “hopping” problem 9 Does not require metric mapping of the environment q pp g ¾ Assumes that user stays in the graph during guidance

Determine location of user in the graph (local node estimation)

3/17/2009

Olivier Koch

Guide the user at that location (rotation guidance)

10

Method overview Loop closure detection

3/17/2009

Olivier Koch

11

Limitations & advantages Limitations ¾ User leaving exploration path ¾ Smooth user motion ¾ Distinctive features visibility

Advantages P Provides id iintuitive, t iti b body-relative d l ti guidance id Requires no extrinsic or intrinsic camera calibration Scales to arbitrary large environments

3/17/2009

Olivier Koch

12

Related work Visual Simultaneous Localization and Mapping (SLAM)

Davison et al al., MonoSLAM: Real-Time Real Time Single Camera SLAM SLAM, PAMI ’07 07 J. Neira et al., Data association in O(n) for Divide and Conquer SLAM, RSS ’07 Wolf et al al., Robust Vision-Based Vision Based Localization by Combining an Image Retrieval System with Monte Carlo Localization, IEEE Transactions Robotics ’05 Konolige, Agrawal et al., . Mapping, Navigation and Learning for Off-road Traversal, Journal of Field Robotics ‘08

Metric and topological localization

3/17/2009

Zhang & Kosecka, Hierarchical Building Recognition, Image and Vision Computing ‘07 B. Kuipers, Using the topological skeleton for scalable global metrical mapbuilding, IROS ’04 Olivier Koch

13

Related work Appearance-based navigation

3/17/2009

Cummins & Newman, Probabilistic Appearance Based Navigation and Loop Closing, ICRA ’07 Collet, Landmark learning and guidance in insects, Ph. Trans. Roy. Soc. London, 1992 Chen & Birchfield, Qualitative vision-based mobile robot navigation, ICRA 06 ICRA’06 Zhang & Kleeman, Robust appearance-based visual route following for navigation in large-scale outdoor environments, IJRR’09

Olivier Koch

14

Method overview

3/17/2009

Olivier Koch

15

Method overview

3/17/2009

Olivier Koch

16

The place graph World as an undirected g graph p G = ((V,, E)) Object

Represents…

Data Structure

ode Node

Location oca o in the e world o d

Visual sua features ea u es (e (e.g. g S SIFT))

Edge

Physical path between two nodes

N/A

3/17/2009

Olivier Koch

17

The place graph Similarityy function Ψ() () – Input: two sets of features F1, F2 – Output: average L2-distance for all feature matches between F1 and F2

Creating a node whenever Ψ > δ

In practice, new node every three seconds (5 meters) at human-walking speed.

3/17/2009

Olivier Koch

18

The place graph

3/17/2009

Olivier Koch

19

Method overview

3/17/2009

Olivier Koch

20

Local node estimation Input p Output

p position of the user in the map p at time t-1 observations at time t position in the map at time t

User motion = Markov process Recursive Bayesian estimation – State xk: position in the map at time k (node label) – Measurement zk: observations at time k (SIFT)

3/17/2009

Olivier Koch

21

Local node estimation p ( xk | z k −1 ) = ∑ p ( xk | xk −1 ) p ( xk −1 | z k −1 ) (prediction) p ( xk | z k ) = λ p ( z k | xk ) p ( xk | z k −1 ) p ( z k | xk ) ~ 1

3/17/2009

ε + ψ ( xk , z k )

Olivier Koch

(update)

p ( xk | xk −1 ) = N (0, σ )

22

Local node estimation p ( xk | z k −1 ) = ∑ p ( xk | xk −1 ) p ( xk −1 | z k −1 ) (prediction) p ( xk | z k ) = λ p ( z k | xk ) p ( xk | z k −1 ) p ( z k | xk ) ~ 1

ε + ψ ( xk , z k )

(update)

p ( xk | xk −1 ) = N (0, σ )

Compute pdf over a local neighborhood of current position only No new node creation

3/17/2009

Olivier Koch

23

Local node estimation

3/17/2009

Olivier Koch

24

Method Overview

3/17/2009

Olivier Koch

25

Rotation Guidance Input p Output

current p position of user in the g graph p current observations guidance to next node in user’s body frame

Approach visual learning

3/17/2009

Olivier Koch

26

The relative orientation problem Problem Estimate the relative user orientation between two visits of the same location (in 2D)

3/17/2009

Olivier Koch

27

The relative orientation problem Assuming g intrinsic & extrinsic camera calibration World features = bearing measurements (α,β,…)

3/17/2009

Olivier Koch

28

The relative orientation problem Assuming g no intrinsic & extrinsic camera calibration – Bearing α → coarse bearing α – α : average of all possible measurements on the camera

Four cameras, covering g each 90° of FOV

3/17/2009

Olivier Koch

29

The relative orientation problem Principle p For a large g number of measurements,, and given the assumptions below, using α instead of α yields a statistically valid estimate of the relative orientation γ. Assumptions – Observations are uniformlyy distributed in image g space p – Observations are made from the same vantage point during revisit

3/17/2009

Olivier Koch

30

The relative orientation problem α α

f/2

3/17/2009

Olivier Koch

31

The match matrix

3/17/2009

Olivier Koch

32

The match matrix H(i,j) ( ,j) = user rotation associated to a match btw camera i and camera j H = coarse approximation of the full camera calibration H is anti-symmetric

H satisfies the “circular circular equality” equality

3/17/2009

Olivier Koch

33

Learning the match matrix Training g Phase – Learn match matrix from training data – Once for a given camera configuration – Does not depend on training environment

Training algorithm – User U rotates t t in i place l iin arbitrary bit environment i t – Algorithm “learns” the match matrix

3/17/2009

Olivier Koch

34

Learning the match matrix η

η User rotates in place nr times in an arbitrary environment (nr =2) F For each pair of frames (f h i ff (fi,ffj) in the training sequence: )i h i i Estimate corresponding user rotation η (e.g. assuming constant rot. speed) Compute feature matches between fi and fj For each match m between a feature on camera r and a feature on camera s, update , p H(r,s) with η 3/17/2009

Olivier Koch

35

Learning the match matrix η

η Training algorithm ¾ 3/17/2009

Runs in arbitrary environment Done once for a given camera configuration Fast (a few minutes) and simple Quadratic complexity in # frames and # features/frame Olivier Koch

36

Learning the match matrix

3/17/2009

Olivier Koch

37

Rotation guidance using the match matrix

3/17/2009

Olivier Koch

38

Rotation guidance using the match matrix

3/17/2009

Olivier Koch

39

Method Overview

3/17/2009

Olivier Koch

40

Loop closure detection 1. Compute node similarity

3/17/2009

2. Extract similar sequences

Olivier Koch

3. Update place graph

41

Loop closure detection 1. Compute node similarity

“Bags of words” word w = (cw, rw) words store list of node labels

Incremental vocabulary Optimized search using search tree 9 Fully incremental 9 No a priori vocabulary Filliat, Interactive Learning of Visual Topological Navigation, IROS’08 3/17/2009

Olivier Koch

42

Loop closure detection 2. Extract similar sequences

Smith & Waterman algorithm Inspired from molecular biology Output: similar node subsequences 9 Robust loop closure detection ¾ Does not detect “instantaneous” loop closure

Ho & Newman, Detecting Loop Closure with Scene Sequences, IJCV’07 3/17/2009

Olivier Koch

43

Loop closure detection 3. Update place graph

Merge node sequences

3/17/2009

Olivier Koch

44

Loop closure detection

3/17/2009

Olivier Koch

45

Match matrix Anti-symmetry: y y error ~ 14.5° Circular equality: error ~ 1.5°

H

0

3/17/2009

⎛ 0 ⎜ ⎜ − 90 = ⎜ 180 ⎜ ⎜ 90 ⎝

90

180

0

90

− 90

0

180

− 90

− 90 ⎞ ⎟ 180 ⎟ 90 ⎟ 0 Olivier Koch

⎠ 46

Potential sources of error

3/17/2009

Constant rotation speed p during g training g Non-homogeneous feature distribution in image space Baseline due to translation during revisit Feature mismatches

Olivier Koch

47

Rotation guidance vs IMU User rotates in place in arbitrary environment Compare rotation guidance against IMU Standard deviation: 8.5° (max error: 20°)

3/17/2009

Olivier Koch

48

Large-scale rotation baseline

First image

3/17/2009

Second image (matches in blue) Aligned images

High-resolution camera pointing upward Average difference between SIFT feature orientations Standard error (vs IMU) < 2° Ground-truth Ground truth throughout exploration path Requires no intrinsic/extrinsic camera calibration Olivier Koch

49

Rotation guidance vs ground-truth

200 checkpoints Standard S d d error: 10 10.5°° (max ( error: 15°) 1 °) 3/17/2009

Olivier Koch

50

Local node estimation vs ground-truth

3/17/2009

Olivier Koch

51

Real-world explorations

GALLERIA dataset GALLERIA dataset

3/17/2009

CORRIDORS dataset CORRIDORS dataset

Olivier Koch

52

Off-path trajectories (GALLERIA dataset) Place graph node Guidance output F il Failure case

Rotation guidance overlaid on 2D map. Values are exact, not notional. 3/17/2009

Olivier Koch

53

Off-path trajectories (CORRIDORS dataset)

3/17/2009

Olivier Koch

54

Rotation guidance (CORRIDORS dataset)

3/17/2009

Olivier Koch

55

Rotation guidance (CORRIDORS dataset)

First visit (SIFT features in yellow)

3/17/2009

Revisit (body‐relative rotation guidance)

Olivier Koch

56

Loop-closure (CORRIDORS dataset)

Exploration path manually overlaid on 2D map 1,500 meters (30 min.)

3/17/2009

Olivier Koch

Place graph (spring‐mass model) 500 nodes (before loop closure) 197 nodes (after) 197 nodes (after)

57

Conclusion Assumptions ¾ ¾ ¾ ¾

Large number L b off visual i l ffeatures t visible i ibl att allll titime Uniform distribution of observations in image space Rigid-body transformation between cameras is fixed but can change slightly Training a gp phase ase (s (short, o ,o once ce for o a ca camera e a co configuration) gu a o )

Advantages 9 9 9 9

Requires no extrinsic or intrinsic camera calibration Scales to large environments (several km) Provides guidance in the user’s body frame Robust to off-path trajectories and high-frequency user motion

Future Work 3/17/2009

Extend to 3D motion (stair ascent/descent) User study on multiple real human users Application to robotics Olivier Koch

58

Questions

3/17/2009

Olivier Koch

59