Robust Recognition in Sequential Images

Robust Recognition in Sequential Images Xiang Xiang PhD candidate in Computer Science 5th-year PhD student, Johns Hopkins University

Vision is hard

● Problems ● Models ● Next

Here sequential images refers to video (2D images over time), hyperspectral images (2D images over channels), medical 3D scans (2D images over slices) and so on.

High-level: action & event

Datasets Small, Video Level Labels — Action Recognition: ● ● ●

UCF101 (101 classes, ~10000 short videos) HMDB51 (51 classes, 7,000 short videos) ActivityNet (200 classes, 20,000 videos, some with frame-level labels)

Large, Noisy Video Level Labels - Video Classification: ● ●

Sports-1M (487 classes, ~1.2M YouTube videos) YouTube-8M (4800 classes, 8 million YouTube videos)

Mid-level: tracking

Under short-term occlusion

Multiple Instance Learning (MIL) tracker + Particle Filter (PF) ACCV 2012: 134-146, Daejeon, Korean.

Tracking-Learning-Detection (TLD) tracker

Mid-level: segmentation

Hopkins 155 dataset

Low-level: robust feature matching

MICCAI 2014, Boston, USA. Computer-Assisted and Robotic Endoscopy 8899, 88-98

Dynamics to rescue ● Problems

● Models ● Next

A 157-seconds video from YouTube.

High-level: keyframe picking to story Keyframes, skims, storyboards, time-lapse, montages or video synopses. telling

Robust PCA

Robust Sparse Representation OMP (Orthogonal Matching Pursuit): greedy forward sequential selection methods.

IEEE ICASSP 2015, Brisbane, Australia. In preparation for IEEE Trans. Affective Computing. Codes available at https://github.com/eglxiang/icassp15_emotion

Pose-Robust Deep Representation ● ● ●

Identification involves one-to-many similarities. Pose variation in uncontrolled environment confuses identity. Processing a video stream is computationally expensive. ●

●

●

K-means clustering the poses estimated as rotation angles. Selecting frames using distances to K-means centroids. Pros: reducing the number of frames from tens or hundreds to K while still preserving the overall diversity.

YouTube Faces: 3,425 videos of 1,595 subjects. Benchmark tests: an official list of 5000 pairs of videos.

Face detection AdaBoost (OpenCV / DLib). Pose estimation Landmark detection: DLib. Face alignment (OpenCV) Center eyes & mouth. Affine warping. Face feature descriptor Pre-trained CNN named VGG-Face. Face similarity metric Correlation (max correlation). Codes available at https://github.com/eglxiang/ytf

Mid-level: tracking On the Youtube dataset with 50 sequences. A motion model helps!

(a) On-line boosting with fixed scale and features of Haar-like, HOG & LBP. (b) Semi-supervised on-line boosting with fixed scale and features of Haar-like, HOG & LBP. (c) MIL-Track with fixed scale and Haar-like feature. (d) MIL- Track with fixed scale and HOG feature. (e) TLD tracker with adaptive scale and a simply designed feature. (f) Basic template matching. (g) Basic Mean Shift. (h) Frag-Track. (i) KLT optical flow. Each corner point is drawn with an arrow (motion vector). (j) Particle filter with all samples drawn. (k) Frame-by-frame detection results of state-of-theart human detector – Deformable Part Model. Xiang, Xiang. "A brief review on visual tracking methods." Intelligent Visual Surveillance (IVS), 2011 Third Chinese Conference on. IEEE, 2011.

Mid-level: segmentation

J. Niebles, B. Han, A. Ferencz, and L. Fei-Fei. Extracting moving people from Internet videos. In ECCV, 2008. J. Niebles, B. Han, and L. Fei-Fei. Efficient extraction of human motion volumes by tracking. In CVPR, 2010.

Low-level: motion feature

from image to video: from dense SIFT to dense optical flow. image matching: SIFT flow Bottom up approach: low-level cues.

video matching?

Motion: optical flow Occurrence of motion: histogram (bag of visual words) spatiotemporal feature: flow words SIFT on patches over time: vector (not on gradients) on optical flow (velocity, motion gradient)

Horn and Schunck, Determining optical flow, MIT, 1981. Michael Black, Robust Incremental Optical Flow, Yale, 1992. Pickup, L. C., Pan, Z., Wei, D., Shih, Y., Zhang, C., Zisserman, A. & Freeman, W. T. Seeing the arrow of time, CVPR 2014.

Perform optical flow computation for vertical and horizontal image gradients separately

frm t vertical component

frm (t+1) vertical component

vertical component

patch size? (spatial bin size) 6 pixels (4x4 grid)

Motion gradient

●

Moving Forward, high-level video analytics is still entirely open.

Question? Thanks for your attention! [email protected]

Robust Recognition in Sequential Images

Robust Recognition in Sequential Images

Suggest Documents

Sequential spatial reasoning in images based on

Asymptotics for Robust Sequential Designs in Misspecified

ROBUST CONTROL VIA SEQUENTIAL SEMIDEFINITE ...

Discriminative Learning in Sequential Pattern Recognition - Microsoft

Robust Vessel Segmentation in Fundus Images

Robust Speckle Detection in Ultrasound Images ... - eia.udg.edu

robust face detection in patient triage images

Text Recognition in Street Level Images - IJETAE

Follicle Recognition In Ultrasonic Images Using

Coloring Action Recognition in Still Images - UAB

TEXT DETECTION AND RECOGNITION IN IMAGES ... - Infoscience

Road Vehicle Recognition in Monocular Images - RobeSafe

Pattern Recognition in Blur Motion Noisy Images

Robust Speaker Recognition in Noisy Conditions - CiteSeerX

ROBUST FACE RECOGNITION WITH OCCLUSIONS IN BOTH

ROBUST SPEECH RECOGNITION IN NOISY ... - Google Sites

Robust Behavior Recognition in Intelligent ... - Semantic Scholar

Challenges in Robust Situation Recognition through Information ...

Making Activity Recognition Robust against

Robust Place Recognition Based on

MiSeRe-Hadoop:A Large-Scale Robust Sequential ...

Object Recognition by Sequential Figure-Ground Ranking

Footprint Recognition using Modified Sequential Haar Energy ...

Cascaded Sequential Attention for Object Recognition ... - CiteSeerX