Computer Vision for Wearable Visual Interface (Slides)

19 downloads 7779 Views 15MB Size Report
... cameras are cheap, small and ubiquitous, and mobile computing ... Some key applications of wearable computers: ▫ Memory .... Real World OCR (Optical Character Reader). ▫ Make a .... C++ for Symbian OS 7.0s on a Nokia 7610. ▫ Image ...
Computer Vision for Wearable Visual Interface Walterio Mayol Department of Computer Science Faculty of Engineering University of Bristol

Takeshi Kurata National Institute of Advanced Industrial Science and Technology (AIST)

2

Table of Contents † † † † † † † † †

Introduction: Why visual sensors? Fundamentals of camera models Where to wear an extra eye? Kinds of cameras Devising real-time wearable visual systems [11:30-12:30] Lunch Relevant computer vision techniques Applications Discussion of social issues with wearable cameras and conclusions W. Mayol

ISWC2005 in Osaka, Japan

T. Kurata

3

Introduction: Why visual sensors?

“These days, cameras are cheap, small and ubiquitous, and mobile computing power is improving constantly”

W. Mayol

ISWC2005 in Osaka, Japan

T. Kurata

4

Why visual sensors? † Advantage of visual sensors: They can recover several properties of the world: „ „ „ „

Identity of objects. Ambient properties such as colours, motion, etc. 3D structure (multiple cameras or when in motion). Provide a signal compatible with the pinnacle of human senses. „ Detect people and their activities. „ Among others…

W. Mayol

ISWC2005 in Osaka, Japan

T. Kurata

5

Cameras help wearables

† Some key applications of wearable computers: „ Memory enhancement. „ Ability enhancement. „ Data collection.

W. Mayol

ISWC2005 in Osaka, Japan

T. Kurata

6

E.g. Casual Capture of Events Images, panoramas and videos automatically detected offline from camera motion.

Headmounted camera

Summary of Phil’s family holidays in the Dolomites:

Custom recorder HPLabs Phil Cheatle. Media Content and Type Selection from Always-on Wearable Video. pp 979-982. ICPR 2004. W. Mayol

ISWC2005 in Osaka, Japan

T. Kurata

7

E.g. Game assistant

© Sam Ogden

HMD+Camera

T. Jebara, C. Eyster, J. Weaver, T. Starner and A. Pentland. "Stochasticks: Augmenting the Billiards Experience with Probabilistic Vision and Wearable Computers". ISWC97.

W. Mayol

ISWC2005 in Osaka, Japan

T. Kurata

8

Fundamentals of camera models † Camera Model „ Perspective (Projective) or Perspective approximations (Affine)? „ You need Euclidean shape/motion?

† 2-D transform (Affine, Homography, etc) may be enough. „ Object: Planer or not? „ Motion: Just rotation or involving translation?

W. Mayol

ISWC2005 in Osaka, Japan

T. Kurata

9

Perspective Camera † So-called a Pinhole camera † Non-linear

W. Mayol

⎛ x⎞ ⎛ f ⎜ ⎟ ⎜ s⎜ y ⎟ = ⎜ 0 ⎜1⎟ ⎜ 0 ⎝ ⎠ ⎝

ISWC2005 in Osaka, Japan

0 f 0

⎛X⎞ 0 0 ⎞⎜ ⎟ ⎟⎜ Y ⎟ 0 0 ⎟⎜ ⎟ Z ⎟ ⎜ 1 0 ⎠⎜ ⎟⎟ ⎝1⎠

T. Kurata

10

Affine Camera: Orthogonal ⎛X⎞ ⎛ x⎞ ⎛ 1 0 0 0 ⎞⎜ ⎟ † The simplest affine approximation ⎜ ⎟ ⎜ ⎟⎜ Y ⎟ s y = f ⎜ 0 1 0 0 ⎟⎜ ⎟ † Ignore the distance and position effects ⎜ ⎟ ⎜1⎟ ⎜ 0 0 0 1 ⎟⎜ Z ⎟ ⎝ ⎠ ⎝ ⎠⎜ 1 ⎟ ⎝ ⎠

W. Mayol

ISWC2005 in Osaka, Japan

T. Kurata

11

Affine Camera: Weak Perspective † Or scaled orthogonal † Zero-order approximation † Distance effect

W. Mayol

⎛X⎞ ⎛ x⎞ ⎛ 1 0 0 0 ⎞⎜ ⎟ ⎜ ⎟ ⎜ ⎟⎜ Y ⎟ s⎜ y ⎟ = f ⎜ 0 1 0 0 ⎟⎜ ⎟ ⎜1⎟ ⎜ 0 0 0 Z ⎟⎜ Z ⎟ c ⎠⎜ ⎝ ⎠ ⎝ ⎟ 1 ⎝ ⎠

ISWC2005 in Osaka, Japan

T. Kurata

12

Affine Camera: Paraperspective † † † †

First-order approximation Distance and position effect Still linear Need to know the focal length

W. Mayol

⎛ ⎜1 ⎜ ⎛x⎞ ⎜ ⎟ ⎜ s⎜ y ⎟ = f ⎜ 0 ⎜1⎟ ⎜ ⎝ ⎠ ⎜0 ⎜ ⎝

ISWC2005 in Osaka, Japan

0 1 0

Xc Zc Y − c Zc 0



T. Kurata

⎞ X c ⎟⎛ X ⎟⎜ ⎟⎜ Y Yc ⎟ ⎜ Z ⎟⎜ Z c ⎟⎜ 1 ⎟⎝ ⎠

⎞ ⎟ ⎟ ⎟ ⎟ ⎟ ⎠

13

Shape-from-motion & Wearable † Affine factorization method „ „ „ „

Robust estimation (M-estimator) to obtain dominant 2D motion SVD to decompose affine shape and motion with the LMedS (Least Median of Squares) method Metric constraints for each specific camera model T. Kurata, J. Fujiki, M. Kourogi, K. Sakaue, "A Fast and Robust Approach to Recovering Structure and Motion from Live Video Frames", CVPR 2000.

W. Mayol

ISWC2005 in Osaka, Japan

T. Kurata

2-D Transform & Wearable: Panorama-based Annotation

14

† Pre-registered panoramic images are used to estimate the position and direction of users with 2-D image transform. †

M. Kourogi, T. Kurata, K. Sakaue, and Y. Muraoka, "Improvement of Panorama-based Annotation Overlay Using Omnidirectional Vision and Inertial Sensors", ISWC 2000.

W. Mayol

ISWC2005 in Osaka, Japan

T. Kurata

15

Image mosaicing with wearables † 2-D Affine or Homography † Real World OCR (Optical Character Reader) „ Make a panoramic image by stitching (mosaicing) „ Texture analysis to detect text regions „

K. Jung, K. Kim, T. Kurata, M. Kourogi, J. Han, "Text Scanner with Text Detection Technology on Image Sequences", ICPR2002.

W. Mayol

ISWC2005 in Osaka, Japan

T. Kurata

16

Where to wear an extra eye?

Positions (A-J) are reported in the literature for wearing static cameras. A) M B) Starner et al. C) Foxlin. D) Starner et al. E) Clarkson and Pentland. F) Rungsarityotin et al. G) Molton. H) Kurata et al. I) Fitzpatrick et al. J) Va W. Mayol

ISWC2005 in Osaka, Japan

T. Kurata

17

Where to wear an extra eye? Chest

Head

Wrist Attitude sensor

Fish-eye Cam IR LED

Mann

Starner et al.

Vardy et al.

Shoulder

Kurata W. Mayol

Mayol

Hujimoto et al.

Foot

Kemp et al. ISWC2005 in Osaka, Japan

T. Kurata

18

Simulating camera placement

Mayol et al. 2001

Articulated Humanoid Model with 1800 polygons. Virtual Cameras Around Shape to Evaluate Sensor Performance. Matlab model at: http://www.robots.ox.ac.uk/~wmayol/3D/nancy_matlab.html W. Mayol

ISWC2005 in Osaka, Japan

T. Kurata

19

Simulating camera placement

W. Mayol

ISWC2005 in Osaka, Japan

T. Kurata

20

Simulating camera placement

Field of View (FOV) for camera ~4cm from body Mayol, Tordoff & Murray 2001 W. Mayol

ISWC2005 in Osaka, Japan

T. Kurata

21

How far away?

W. Mayol

ISWC2005 in Osaka, Japan

T. Kurata

22

Purposive placement

W. Mayol

ISWC2005 in Osaka, Japan

T. Kurata

23

Example of criteria fusion FOV + Motion + Handling view

W. Mayol

ISWC2005 in Osaka, Japan

T. Kurata

24

Sensor placement Zone

FOV

Head

Wide

Shoulder

Wide

Chest

Reduced

W. Mayol

Best suited to task Linked to user’s attention Most tasks

Body motion Large

Linked to user’s workspace

Small

ISWC2005 in Osaka, Japan

Small

T. Kurata

25

Kinds of Camera

Rotating Narrow FOV

mirror

Narrow FOV

Fish-eye

Catadioptric Images from Nayar 1997

W. Mayol

ISWC2005 in Osaka, Japan

T. Kurata

26

Some notes on hardware for prototypes Current common interfaces: „USB (currently cheap with interlaced sensors) „Firewire (currently more expensive, better sensor & control) „Analog (small units, needs capture card) „Direct digital (directly interface the chip) Unibrain.com

ptgrey.com

Firefiwe cameras w/progressive scan, full control of shutter speed, WB, frame rate, etc

W. Mayol

ISWC2005 in Osaka, Japan

Interlacing scanning

Progressive scanning

T. Kurata

27

Single-lens Camera Image

Narrow FOV FOV StartleCam Healey &. Picard ISWC1998

Fisheye lens

Most studied case.

† More suitable for high resolution of a particular region. † Small volume. † Wearable positions: head looking ahead, head looking to hands, chest, foot, etc. Clarkson & Pentland MIT 2001 W. Mayol

ISWC2005 in Osaka, Japan

T. Kurata

28

From US PAT2299682

Catadioptric: mirror-lens

W. Rungsarityotin, T. Starner, Finding Location Using Omnidirectional Video on a Wearable Computing Platform, ISWC 2000.

† Easy to obtain views round the user. † Usually low resolution and large volume. † Hard to decide where to wear it. K Nishino & S Nayar, The World in an Eye, CVPR 2004 W. Mayol

ISWC2005 in Osaka, Japan

T. Kurata

29

Trinocular and more † 3D Visual Information => 3D Auditory Information „

Y. Kawai, F. Tomita, “A Support System for Visually Impaired Persons to Understand Three-dimensional Visual Information Using Acoustic Interface”, ICPR02.

† Two way stereo (near/distant) „ „ „

miniBEE by ViewPLUS 150g, 2.0W, USB2.0 www.viewplus.co.jp For distant view

Distant object

For near view Near object

W. Mayol

ISWC2005 in Osaka, Japan

T. Kurata

30

Images: Kirschfeld (1976)

What kind of camera?

Compound eye with human-like resolution W. Mayol

ISWC2005 in Osaka, Japan

T. Kurata

31

A lesson from The Animal Kingdom? Wider FOV low-acuity

“Mobile” narrow FOV hi-acuity eyes

8 eyes in total

Salticidae (Jumping) spiders. Images from: Wayne Maddison, Tree of life web project, www.tolweb.org

Further reading: M F Land & D-E Nilsson, Animal Eyes, Oxford University Press. 2002.

W. Mayol

ISWC2005 in Osaka, Japan

T. Kurata

Devising real-time wearable visual systems † Computational/Software architectures „ Device „ Software Environments

† Active vision approach „ Light control (IR Vision) „ Camera head (pan/tilt/roll/zoom) control

W. Mayol

ISWC2005 in Osaka, Japan

T. Kurata

32

33

Computational/Software architectures † Mobile Device (Smartphone, PDA) „ „ „ „ „

J2ME QUALCOMM BREW & C++ Windows Mobile (for Smartphone) Symbian OS & C++ Embedded Linux, ITRON, etc

† WearComp „ Embedded Comp, Laptop (subnotebook) PC

† Library, SDK, Toolkit… „ IPP, OpenCV, DirectShow, ARToolkit, Georgia Tech Gesture Toolkit W. Mayol

ISWC2005 in Osaka, Japan

T. Kurata

34

Cell Phone as a platform for CV Thomas Riisgaard Hansen ‡http://www.daimi.au.dk/~thomasr/ ‡http://www.pervasive-interaction.org/Mixis/ ‡http://chess.eecs.berkeley.edu/publications/talks/05/hansen0517.pdf

† J2ME „ Pro: Portability, easy to get started „ Con: Limited functionality, slow

† Symbian OS & C++ „ Pro: Runs on a lot of phones, fast, programmers can control a lot of functions „ Con: Limit compatibility, hard to program

† Windows Mobile for SmartPhone „ Pro: It is Microsoft, resembles the programming you do on Windows PCs. „ Con: Runs only on a limited number of devices W. Mayol

ISWC2005 in Osaka, Japan

T. Kurata

Running on a cell phone: MIXIS †

35

T.R. Hansen, E. Eriksson, A. Lykke-Olesen, “Mixed Interaction Spaces – Designing for Camera Based Interaction with Mobile Devices”, CHI 2005.

† Tracking a fixed-point with a cell phone „ Randomized Hough Circle Detection † Much faster than Non-Randomized Hough Algorithms † Hand-drawn black circle is tracked

† Optimized version of the CamShift (Continuously Adaptive Mean Shift) †

G.R. Bradski, “Computer Vision Face Tracking For Use in a Perceptual User Interface”, Intel Technology Journal Q2, 1998.

† The fixed-point is always with the user

„ C++ for Symbian OS 7.0s on a Nokia 7610 „ Image resolution: 160x120 or 320x240 pixels W. Mayol

ISWC2005 in Osaka, Japan

T. Kurata

Running on a cell phone: QR (Quick Response) Code

36

† Open Code (can be used for phishing)

Position detection patterns

„ QR Code (R) is registered trademarks of DENSO WAVE INCORPORATED in JAPAN and other countries. „

http://www.qrcode.com/

† Standardization „

AIM (Automatic Identification Manufacturers ) International, JIS, ISO

Data Area

† Readable from any direction in 360° † Dirt and Damage Resistant „ QR Code has error correction capability. Data can be restored even if the symbol is partially dirty or damaged.

† Getting Popular (Especially in Japan) „ Newspaper, Magazine, Poster, and Business card „ NTT DOCOMO, au by KDDI, Vodafone W. Mayol

ISWC2005 in Osaka, Japan

Symbol Version: 5 Error Correction Level: M (15%)

T. Kurata

Running on a cell phone: SyncR on BREW

37

† SynchroReality by OLYMPUS „ Natural picture/icon as a marker „ http://syncr.jp/sr/ (in Japanese)

(1) Download a set of templates

(4) Access the detailed info (3) Display the relevant info

(2) See some picture through the camera

W. Mayol

ISWC2005 in Osaka, Japan

T. Kurata

Running on a cell phone: OKAO Vision (OKAO = face in Japanese) † Face Recognition Product by OMRON „

http://www.omron.com/r_d/vision/01.html

† Symbian OS, BREW, embedded Linux, ITRON † 0.2% FRR (False Reject Rate) with 1.0% FAR (False Acceptance Rate) † Fast (1.0 sec. with ARM9core 200MHz) † Previously employed Gabor-wavelet transform and graph matching, but the current method is confidential :-)

W. Mayol

ISWC2005 in Osaka, Japan

T. Kurata

38

Running on a cell phone: Text Region extraction & OCR

39

† Work by Keiichi TAMAI et al., ORMON SOFTWERE http://www.omronsoft.com/index.html

Example of LoG kernel

Num of pixels

† Cell phone OCR appeared in the market in November, 2002. † Compensation for defocus, motion blur, and (lens) shading: LOG (Laplacian of Gaussian) filter, partial equalization † Compensation for roll: Hough transform † Performance: 1.48sec (ARM9core, 196MHz), 89.4% recall, about 50% precision

Num of pixels

„

Intensity

W. Mayol

ISWC2005 in Osaka, Japan

T. Kurata

background character

Intensity

OCR on a cell phone: Application

40

English-Japanese dictionary by ORMON SOFTWERE

Automatic selection

Manual selection

W. Mayol

ISWC2005 in Osaka, Japan

T. Kurata

Other Related Works with a smartphone and PDA

41

† PDA & ARToolkit „ Handheld AR Project (ARToolkit Plus) † D. Wagner, D. Schmalstieg, “First Steps Towards Handheld Augmented Reality”, ISWC2003. † http://studierstube.org/handheld_ar/

„ W. Pasman, C. Woodward, “Implementation of an Augmented Reality System on a PDA”, ISMAR2003. † http://graphics.tudelft.nl/~wouter/

† SmartPhone & {Marker-based AR, Object Recognition} „ A. Henrysson, M. Billinghurst, M. Ollila, “Face to Face Collaborative AR on Mobile Phones”, ISMAR 2005 (7fps on Nokia6630, ARM9Core 210MHz, Symbian OS). † http://www.itn.liu.se/~andhe/UMAR/ „ M. Möhring, C. Lessig, O. Bimber, “Video See-Through AR on Consumer Cell-Phones”, ISMAR2004. † http://www.uni-weimar.de/~bimber/ W. Mayol

ISWC2005 in Osaka, Japan

T. Kurata

42

Marker-based AR on a mobile device † See-through Information Display „ „

PDA + HMD SynchroReality by OLYMPUS

W. Mayol

ISWC2005 in Osaka, Japan

T. Kurata

Embedded Comp: ViewRanger by SGI Japan

43

† http://www.sgi.co.jp/solutions/bbu/viewranger/

† SH-4 240MHz, Net BSD, 130g, 2.5W † In: NTSC, AUDIO, RS232Cx2, Out: AUDIO † CF (WiFi, PHS, FOMA), 100Base-TX † Live video streaming (MPEG) and Live audio communication via Apache † Local logging of video and audio (MPEG4) † Motion Tracking, Edge Detection, etc † Programmable (g++) W. Mayol

ISWC2005 in Osaka, Japan

T. Kurata

44

Toolkit: IPP, Open CV † IPP (Intel Performance Primitives) „ Signal processing (1-D), Image processing (2-D), Matrix computing

† OpenCV „ „

http://www.intel.com/technology/computing/opencv/index.htm http://sourceforge.net/projects/opencvlibrary/

„ „ „ „ „ „ „ „

Basic operations Statistics and moments computing Image processing and analysis Structural analysis Motion analysis and object tracking Pattern recognition 3-D reconstruction and camera calibration Graphic interface and acquisition W. Mayol

ISWC2005 in Osaka, Japan

T. Kurata

45

Toolkit: ARToolkit † Popular toolkit for Marker-based tracking „ http://www.hitl.washington.edu/artoolkit/ (HITLAB Seattle & NZ) „ H. Kato, M. Billinghurst, “Marker Tracking and HMD Calibration for a video-based Augmented Reality Conferencing System”, IWAR99. „ Platforms: from SGI, Linux, Windows to PDA, SmartPhone

† Overview

W. Mayol

ISWC2005 in Osaka, Japan

T. Kurata

46

Coordinate Systems

W. Mayol

ISWC2005 in Osaka, Japan

T. Kurata

Toolkit:

47

Georgia Tech Gesture Toolkit (GT2K) † Support Experiments in HMM-based Gesture Recognition „ http://www.gt2k.cc.gatech.edu „ Platforms: Linux, Unix, Windows via CygWin (no promises)

† GT2K Provides suite of tools for gesture recognition „ data preparation, training, validation, recognition

† User provides „ data, feature vector, model topology and grammar, results interpreter

† When to Use GT2K „ Applications using symbolic/iconic gestures †sign language, handwriting „ Not good for tracking applications (e.g. finger mouse) Data Generator W. Mayol

Feature Vector

Georgia Tech Gesture Toolkit ISWC2005 in Osaka, Japan

Classified Gesture

Results Interpreter

T. Kurata

Application: American Sign Language translator

48

† one-way, mobile translation, recognizes basic vocabulary, grammar „ Helene Brashear, Thad Starner, Paul Lukowicz, Holger Junker: Using Multiple Sensors for Mobile Sign Language Recognition. ISWC 2003. „ Hat-mounted camera, wrist-mounted accelerometers Source: CNN, 2003

W. Mayol

ISWC2005 in Osaka, Japan

T. Kurata

49

GT2K Background †Cambridge University's HMM (Hidden Markov Model) Toolkit (HTK) „ http://htk.eng.cam.ac.uk/ „ can be difficult for non-expert users „ intended for speech recognition „ users must understand how to relate speech to gesture

† GT2K uses HTK for modeling and recognition „ abstracts lower level details of HMMs „ abstracts speech-specific details of HTK „ allows quick prototyping of gestural interfaces

W. Mayol

ISWC2005 in Osaka, Japan

T. Kurata

50

GT2K Functionality Overview † Data Preparation „ design gesture models „ specify gestures „ GT2K generates gesture grammar „ label data for training †one gesture per file (automatic labeling) †multiple gestures per file (assisted labeling)

† Training „ determines which examples to use for training †Cross-validation (2/3 training, 1/3 validation) †Leave-one-out (one validation, rest training) †User-defined W. Mayol

ISWC2005 in Osaka, Japan

T. Kurata

51

GT2K Functionality Overview † Validation „ Automatically validates model based on chosen method (e.g., crossvalidation) „ Reports accuracy „ Produces confusion matrix

† Recognition „ returns name of most likely gesture

† Uses classified gesture to issue command to devices „ hand moving up ► volume up „ hand moving down ► volume down W. Mayol

ISWC2005 in Osaka, Japan

T. Kurata

52

The Active Vision Approach

Active Vision “Efficient use of available optical and computational resources” (Pioneered by Bajcsy 1988, Ballard and Brown 1992, Aloimonos 1987, et al.)

Examples: 1) A face tracking camera. 2) A camera that emits IR illumination only when a new observation is needed. W. Mayol

ISWC2005 in Osaka, Japan

T. Kurata

Active vision approach: IR (Infrared) Vision

53

† The most successful vision system „ measures only what you want to measure

† IR Vision system: Simplify image processing „ Segmentation, Detection, Tracking „ Can use Invisible targets

† Drawback „ Sunlight, heat source „ Additional camera to capture normal images † could be solved by synchronizing the light flashing and shutter speed W. Mayol

ISWC2005 in Osaka, Japan

T. Kurata

IR Vision: Invisible Visual Marker for AR † Y. NAKAZATO, M. KANBARA, N. YOKOYA, “Discreet markers for user localization”, ISWC2004 „ „ „

http://yokoya.naist.jp/index.html Easy infrastructures No impairing the scenery

×Visual Marker

W. Mayol

9Invisible Visual Marker

ISWC2005 in Osaka, Japan

T. Kurata

54

55

System Overview Translucent Retro-reflective Marker Reflection

Infrared Light

RS-232C USB Wearable Computer (MP-XP7310 [JVC]) W. Mayol

Video Capture Unit (NV-UT200 [NOVAC]) ISWC2005 in Osaka, Japan

IR Camera and IR-LEDs T. Kurata

56

IR Vision: Gesture Pendant † † †

T. Starner, J. Auxier, D. Ashbrook, M. Gandy, “The Gesture Pendant: A Self-illuminating, Wearable, Infrared Computer Vision System for Home Automation Control and Medical Monitoring”, ISWC2000. Device for control of home automation systems via hand gestures (HMM-based) The Gesture Pendant can also analyze their movements for pathological tremors.

Source: ABC, 2000 W. Mayol

ISWC2005 in Osaka, Japan

T. Kurata

57

IR Vision: Low Vision Aid † R.C. Bryant, C.M. Lee, R.A. Burstein, E.J. Seibel, “Engineering a low-cost wearable low vision aid based on retinal light scanning”, SID2004 „ http://www.hitl.washington.edu/research/wlva/ † Help visually impaired people avoid obstacles in their path † The closest objects appear bright † Warning icon projected into the eye with a laser light † Laser light through a vibrating optical fiber directly into the user’s retina

Source: Seattle Post-Intelligencer, 2004 W. Mayol

ISWC2005 in Osaka, Japan

T. Kurata

Active vision approach: Camera motion control

58

2D Accelerometer

† Wearable Visual Robot Mayol, Tordoff, Murray. ISWC 2000

CCD

3 Motors

† Wearable Active Camera/Laser (WACL) Sakata, Kurata, Kato, Kourogi, Kuzuoka. ISWC 2003

W. Mayol

ISWC2005 in Osaka, Japan

T. Kurata

59

Max velocity with highest detail (Angle per pixel) (Shutter speed)

E.g. Webcam 640x480 @ 30Hz & FOV of 42 degrees horizontal: •Max time to collect light is ~30ms •Detail starts to blur at just 2 deg/s Person walking 2m away moves at ~50 deg/s on the image. W. Mayol

ISWC2005 in Osaka, Japan

T. Kurata

60

Why an Active Solution? Passive Camera Slow shutter Fast shutter

W. Mayol

Active Camera Slow shutter Fast shutter

ISWC2005 in Osaka, Japan

T. Kurata

61

WACL: Wearable Active Camera/Laser † Camera/Laser head on a small pan/tilt platform „ The laser pointer is almost parallel to the camera „ The laser spot is close to the center of the video images „ Easy to miniaturize and reduce weight because of the simple mechanism

† Stabilization capability „ Based on image registration and inertial sensors

[ISWC 2003] W. Mayol

[WACL1: Specification] Size: 52 × 46 × 45 mm, Weight: 100 g CCD: 270,000 pixel 1/4 inch color, FOV: 49 deg. Laser pointer: 650 nm, class 2 Motion sensor: InterSense InterTrax2 Microcontroller: H8 Actuator: two DC geared motors Pan/tilt: 270 deg. (pan) and 82 deg. (tilt), ±0.2 deg. (accuracy) Stabilization accuracy: [sensor] ±2.1 deg. [image] ±0.6 deg. ISWC2005 in Osaka, Japan

T. Kurata

62

Section C (WACL & HMC trial) Expert’s view Worker’s view

W. Mayol

ISWC2005 in Osaka, Japan

T. Kurata

63

Experiences of camera wearability

Study of Collar Structures W. Mayol

ISWC2005 in Osaka, Japan

T. Kurata

64

A tip for building robust prototypes

“Friendly Plastic” by www.amaco.com re-usable polyethylene thermoplastic that softens in warm water and becomes hard in cold water.

W. Mayol

ISWC2005 in Osaka, Japan

T. Kurata

65

Relevant computer vision techniques † Integration with inertial sensors † Gesture identification † SLAM: Simultaneous Localization and Mapping

W. Mayol

ISWC2005 in Osaka, Japan

T. Kurata

Integration with inertial sensors: Personal Positioning (PP) †

66

M. Kourogi, T. Kurata, “Personal Positioning based on Walking Locomotion Analysis with Self-Contained Sensors and a Wearable Camera”, ISMAR03

† Detect a cycle of walking locomotion and walking direction by analyzing the acceleration/angular velocity/magnetic vectors † Dip angle: uniquely determined by the latitude and longitude of the location to handle disturbances of the magnetic field

Dead-reckoning module based on walking Accelerometers (3-axes) Gyro-sensors (3-axes) Magnetometers (3-axes) W. Mayol

ISWC2005 in Osaka, Japan

T. Kurata

Personal Positioning (PP): Recognize human locomotion behavior The forward direction obtained by PCA

0.4 Vertical acceleration

8 7

0.2 0.15

0 -0.1 -0.2

6

0.1

Speed [Km/h]

0.1

Forward acceleration [G]

Acceleration [G]

0.2

0.05 0 -0.05

-0.25 -0.25

10000

10500

11000

11500

-0.15

-0.1

-0.05

0

0.05

0.1

0.15

0.2

0

0.25

0

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

Vertical acceleration gap [G]

side-forward acceleration while in walking

Walking on a flat floor

0.6

0.5

Vertical acceleration Angular velocity (roll)

0.4

0.05

12000

Time [msec]

Speed estimation

Vertical acceleration Forward acceleration

0.5

1.04

Vertical acceleration

0.4

0.3

1.02

0.3

0.2

1

0 -0.1

Acceleration [G]

0.2

0.1

Acceleration [G]

Angular velocity [rad/s], acceleration [G]

-0.2

Side acceleration [G]

9500

3

1

-0.3

9000

4

2

-0.15

8500

5

-0.1

-0.2

-0.4 8000

Subject A Subject B Fitting line (A) Fitting line (B)

0.25

Forward acceleration

0.3

67

0.1 0

0.98

0.96

-0.1 0.94

-0.2

-0.2

-0.3

-0.4

-0.5 8000

0.92

-0.3

-0.4

0.9 2000

-0.5

8500

9000

9500

10000

Time [msec]

Going up stairs W. Mayol

10500

11000

10000

10500

11000

11500

12000

12500

13000

13500

4000

6000

8000

10000

12000

14000

16000

Time [msec]

14000

Time [msec]

Going down stairs ISWC2005 in Osaka, Japan

Taking an elevator T. Kurata

18000

20000

68

Fast Image Registration: Pseudo-motion vector † Approximation of the actual 2-D motion vector „ M. Kourogi, T. Kurata, J. Hoshino, Y. Muraoka, “Real-time Image Mosaicing from a Video Sequence”, ICIP'99.

∂I ∂I ∂I u p + vp + = 0 ∂x ∂y ∂t

Pixel-wise matching

∂I ⎧ ∂I ⎪⎪ ∂x u p + ∂t = 0 ⎨ ∂I ⎪ v p + ∂I = 0 ⎪⎩ ∂y ∂t W. Mayol

image brightness

Block-wise matching

the current 1-D image brightness frame the reference frame spatial gradient

temporal gradient motion vector

the position of pixel ISWC2005 in Osaka, Japan

T. Kurata

69

PP Demo Video (Dead Reckoning only)

In 2003 International Robot Exhibition at Tokyo Big sight

3x speed play

W. Mayol

ISWC2005 in Osaka, Japan

T. Kurata

70

Indoor Navigation † Dead-Reckoning and Image Registration

W. Mayol

ISWC2005 in Osaka, Japan

T. Kurata

71

Outdoor Navigation † Dead-Reckoning and GPS 50m

W. Mayol

ISWC2005 in Osaka, Japan

T. Kurata

Integration with inertial sensors: AirGrabber † †

72

M. Fujimoto, M. Imura, Y. Yasumuro, Y. Manabe, K. Chihara, “AirGrabber: Virtual Keyboard using Miniature Camera and Tilt Sensor”, Trans. VRSJ, Vol.9, No.4, 2004 (in Japanese). http://chihara.naist.jp/research/research_UC.html

Attitude sensor

Fish-eye Cam IR LED

W. Mayol

ISWC2005 in Osaka, Japan

T. Kurata

73

Gesture identification † Gestures can be explicit or not. † Most work for wearables has perhaps not surprisingly concentrated on hands.

W. Mayol

ISWC2005 in Osaka, Japan

T. Kurata

74

Skin detection Probabilistic Method

Bounding Box Method

Skin colour V

Skin zone

Non skin U

p(classi | color ) =

p(color | classi ) p(classi ) N

∑ p(color | class ) p(class ) j =1

W. Mayol

j

j

ISWC2005 in Osaka, Japan

Note: Other colour spaces such as RGB, HSV etc are also in use

T. Kurata

75

Skin detection

Colour Image

W. Mayol

Skin segmentation

ISWC2005 in Osaka, Japan

Image filtered with 5x5 “averaging mask”

T. Kurata

76

ASL Recognition

Starner, Weaver & Pentland ISWC1997

„Skin images converted to vector of characteristics: (e.g. hand position in the image, dimension and orientation of bounding ellipse) „~97% accuracy over a 40 word vocabulary using grammar as constraint. W. Mayol

ISWC2005 in Osaka, Japan

T. Kurata

77

ASL Recognition

Starner & Pentland ISCV95 W. Mayol

ISWC2005 in Osaka, Japan

T. Kurata

78

SLAM: Simultaneous Localization and Mapping “How to know in Real-Time where sensor AND objects are without any positioning prior” •Key problem in robots and wearables. •Refinement of Structure From Motion: •Demands causality (no bundle adjustment).

W. Mayol

ISWC2005 in Osaka, Japan

T. Kurata

79

SLAM: Simultaneous Localization and Mapping Single Camera SLAM feature uncertainty

Key aim: not just account for uncertainty but reduce it See Work by Andrew Davison See: http://www.doc.ic.ac.uk/~ajd W. Mayol

ISWC2005 in Osaka, Japan

T. Kurata

80

Davison’s Algorithm for monocular SLAM Sensor’s state: 3D position, 3D Orientation (quaternion), 3D velocity, 3D angular velocity.

Feature’s state: 3D position.

⎛ xi ⎞ ⎜ ⎟ yi = ⎜ yi ⎟ ⎜z ⎟ ⎝ i⎠

“Full covariance” EKF

A J Davison, Real-Time Simultaneous Localisation and Mapping with a Single Camera, ICCV, 2003 W. Mayol

ISWC2005 in Osaka, Japan

T. Kurata

81

Davison’s Algorithm for monocular SLAM

Update Model

A J Davison, Real-Time Simultaneous Localisation and Mapping with a Single Camera, ICCV, 2003 W. Mayol

ISWC2005 in Osaka, Japan

T. Kurata

82

Davison’s Algorithm for monocular SLAM

Use a calibration object to get initial scale

W. Mayol

ISWC2005 in Osaka, Japan

T. Kurata

83

Davison’s Algorithm for monocular SLAM

Shi -Tomasi “corner” operator to detect subsequent features

Images from Davison ICCV, 2003 Particle filter to estimate depth W. Mayol

ISWC2005 in Osaka, Japan

T. Kurata

84

SLAM applied to the Wearable Robot

Davison, Mayol, Murray, ISMAR 2003 W. Mayol

ISWC2005 in Osaka, Japan

T. Kurata

85

Applications † † † †

Hand gesture-based visual interface Wearable Augmented Reality Wearable Augmented Memory Remote collaboration with AR and 3D map building † Real World Coupled Visual Interface Development Kit

W. Mayol

ISWC2005 in Osaka, Japan

T. Kurata

Hand gesture-based interface: WVT: Wearable Virtual Tablet †

86

N. Ukita, M. Kidode, ``Wearable Virtual Tablet: Fingertip Drawing on a Portable Plane-Object using an Active-Infrared Camera'', IUI2004.

„

http://ai-www.aist-nara.ac.jp/

† Basic Scheme: Fingertip drawing on a plane using an infrared camera

1.Input-plane extraction (Hough transform) W. Mayol

2.Fingertip detection (Arc [semi-circle] detection)

3.Discrimination between input/non-input (distance, shade of the fingertip)

ISWC2005 in Osaka, Japan

T. Kurata

Wearable Augmented Reality: Outdoor See-Through Vision †

Y. Kameda, T. Takemasa, Y. Ohta, “Outdoor See-Through Vision Utilizing Surveillance Cameras”, ISMAR04. „

† † †

87

http://www.image.esys.tsukuba.ac.jp/Eindex.html

Initial estimation employing a GPS and compass Tracking pose changes by inertia sensor Pose adjustment by referring landmarks (natural feature detection)

W. Mayol

ISWC2005 in Osaka, Japan

T. Kurata

Wearable Augmented Reality: AR Manual Authoring †

88

T. Okuma, T. Kurata, K. Sakaue, “Fiducial-less 3-D Object Tracking in AR Systems Based on the Integration of Top-down and Bottom-up Approaches and Automatic Database Addition”, ISMAR03

W. Mayol

ISWC2005 in Osaka, Japan

T. Kurata

Wearable Augmented Memory: Lifelog † †

89

“Lifelog” Project proposed by DARPA, 2003 Lifelog research by Aizawa-lab, Univ of Tokyo „ „

http://www.hal.t.u-tokyo.ac.jp/ 70 years recording will be available later on!? †

„

70 x 365 x 16 x 60 x 60 sec x 1Mbps = 184 TB

But, how to watch? retrieval, indexing, summarization

W. Mayol

ISWC2005 in Osaka, Japan

T. Kurata

Remote collaboration with AR and 3D map building

W. Mayol

ISWC2005 in Osaka, Japan

T. Kurata

90

Remote collaboration with AR and 3D map building

Davison, Mayol, Murray, ISMAR 2003 W. Mayol

ISWC2005 in Osaka, Japan

T. Kurata

91

92

Weavy and RWCVI-DK † AIST has developed a commercial RWCVI-DK (Real World Coupled Visual Interface Development Kit) in cooperation with Media Drive Corporation that includes: „Personal Positioning (PP) with selfcontained sensors „Hand-Gesture (HG) Interface „Real-World OCR (RWOCR) „Weavy: Wearable Visual Interface Project „http://www.is.aist.go.jp/ „Media Drive „http://adv.mediadrive.jp/product/link _visual/index.html (in Japanese) W. Mayol

ISWC2005 in Osaka, Japan

T. Kurata

93

Hand gesture-based interface on the Weavy †

T. Kurata, T. Kato, M. Kourogi, J. Keechul, K. Endo, “A Functionally-Distributed Hand Tracking Method for Wearable Visual Interfaces and Its Applications”, MVA2002.

†

The Distributed Monte Carlo (DMC) Tracking Method „ „

†

An extension of the Condensation algorithm for heterogeneous distributed architecture Different numbers of samples and different dynamical models on the wearable and infrastructure side respectively

The Functionally-Distributed (FD) Hand Tracking Method „

Combination of the DMC Tracking Method and hand color modeling with a GMM

W. Mayol

ISWC2005 in Osaka, Japan

T. Kurata

Applications of Hand gesture-based interface

RWOCR with HG

Secure PIN Input with HG

W. Mayol

ISWC2005 in Osaka, Japan

T. Kurata

94

95

Social Issues with wearable cameras † Cameras have been around for some time and acceptability is evolving. † Different countries (and sometimes states) have different legislations, mainly related to pictures of people. † The problem are not the cameras themselves but how you network them [Pentland 2000]. † Any comments on this?

W. Mayol

ISWC2005 in Osaka, Japan

T. Kurata

96

Thank You, Gracias, Arigato!! † Contact Info „ Walterio Mayol † [email protected] † http://www.cs.bris.ac.uk/~wmayol/

„ Takeshi Kurata † [email protected] † http://www.is.aist.go.jp/kurata/ † http://unit.aist.go.jp/itri/itri-rwig/

W. Mayol

ISWC2005 in Osaka, Japan

T. Kurata