Intelligent Video Surveillance in Crowded Scenes - Electronic ...

Intelligent Video Surveillance in Crowded Scenes Xiaogang Wang Department of Electronic Engineering The Chinese University of Hong Kong

Intelligent Video Surveillance y The number of surveillance cameras is fast increasing The Heathrow airport in London has 5,000 surveillance cameras By 2009, China has installed more than 3,000,000 surveillance cameras. This number

will increase more than 40% per year in the next five years.

y Applications Homeland security Anti-crime Traffic control Monitor M it children, hild elderly ld l andd patients ti t att hhome …

y Functions Low-level: detect, track and recognize objects of interest High-level: understand activities of objects and detect abnormalities

… Crowded scenes

Sparse scenes

Single g camera view

Multiple p camera views

“The requirements for the next generation of video surveillance y are robustness, reliability, y scalability y and selff systems adaptability for crowded, large and complex scenes. ” (Remagnino et al. Machine Vision and Applications, 2007)

Conventional Approaches for Activity Analysis y Detection and track based Detection and tracking are unreliable in crowded because of occlusions

Object detection and tracking

Trajectories off objects b Activity analysis

… Typical activity categories

Abnormal activities

Conventional Approaches for Activity Analysis y Motion based Cannot separate co-occurring activities

Walk

…

… …

…

… Video sequence

Wave Run

Motion feature vector

Divide into short video clips

Activity analysis

Features of our Approach y Detection and tracking are not required y Separate co-occurring activities y Work robustly in crowded scenes y Unsupervised (no need to manually label training data) y Simultaneously model simple activities, activities more complicated

interactions among objects and global behaviors in the scene

X. Wang, X. Ma and E. Grimson, “Unsupervised Activity Perception in Crowded and Complicated Scenes Using Hierarchical Bayesian Models,” IEEE Trans. on Pattern Analysis and Machine Intelligence (PAMI), Vol. 31, pp. 539-555, 2009.

High level picture of our approach

Motion Features ( ) (a)

Atomic activities modeled as distributions over the feature codebook ( ) (b) 0.3

0.5 0.45

0.25 0.4 0.35

0.2

0.3

0.15

0.25

Global behaviors modeled as distributions over atomic activities (c)

0.2

0.1

0.15 0.1

0.05 0.05 0

0

…

…

Parametric hierarchical Bayesian model μ

η

0.3 0.5 0.45

0.25

0.4 0.35

0.2

0.3 0.25

0.15

0.2 0.15

0.1

0.1 0.05 0

cj

0.05

β1

0

1

β2

2

Global behavior models (L = 2)

β0

βc

πj ∞

Global Gl b l behavior b h i models

z ji

φ1

φ2

φ3

φ4

1

2

3

4

Atomic activity models (K=4)

H

φk ∞ Atomic activity models

x ji …

Nj M

Observed feature values of moving pixels

πj Video clip j (j = 1…M)

…

Temporal co-occurrence of moving pixels

Moving pixels in a short video clip

Spatial distribution of an atomic activity

Experimental Results y The input is a 90 minutes long traffic video sequence

Learned atomic activities from a traffic scene

(1)

(2)

(3)

(4)

(5)

(6)

(7)

(8)

(9)

(10)

(11)

(12)

(13)

(14)

(15)

(16)

(17)

(18)

(19)

(20)

(21)

(22)

(23)

(24)

(25)

(26)

(27)

(28)

(29)

Global behavior I: green light for south/north traffic

prior

index of atomic activities

vehicles northbound

vehicles northbound

vehicles southbound

vehicles incoming northbound

vehicles incoming southbound

vehicles outgoing eastbound

Global behavior II: green light for east/west traffic

prior


vehicles incoming westbound

vehicles outgoing westbound

vehicles outgoing southbound

vehicles incoming eastbound


pedestrians westbound

Global behavior III: left turn signal for east/west traffic

prior


vehicles turning left eastbound

vehicles outgoing northbound

vehicles outgoing northbound

vehicles incoming eastbound


vehicles stopping southbound

Global behavior IV: walk sign

prior


pedestrians incoming eastbound

pedestrians outgoing eastbound



vehicles stopping

vehicles stopping

Global behavior V: northbound right turns

prior


vehicles incoming northbound


Temporal video segmentation

green light for east/west traffic

walk sign

green light for south/north traffic

northbound right turns

left turn signal for east/west traffic

Confusion matrix of video segmentation Clustering result

Manual label

The average accuracy is 85.74% using our approach.

The average accuracy is 65.6% when modeling atomic activities and global behaviors in two separate steps. The approaches, such as Zhong et al CVPR’04, of using a motion feature vector t representt a video to id clip li perform f poorly l on thi this ddata. t

Abnormality detection results

Interaction query

vehicles approaching 0.5 0.45 0.4 0.35 0.3 0.25 0.2 0.15

pedestrians crossing the street

0.1 0.05 0 1

3

5

7

9

11

13

15

17

19

21

23

Query distribution

25

27

29

Top four retrieved jay-walking examples

Pedestrians/Vehicles Detection Based on Motions

Atomic activities related to vehicles

Pedestrians/Vehicles Detection Based on Motions

Atomic activities related to pedestrians

Classify Motions into Vehicles (Red) and Pedestrians (Green)

Conclusion y Propose an unsupervised approach for robust activity analysis

in crowded scenes y Co-occurring activities are separated without detecting and tracking objects y Only using moving pixels as features, this approach is able to ¾ detect activities ¾ analyze interactions among objects ¾ temporally segment video sequences into global behaviors ¾ detect abnormalities ¾ classify motions into pedestrians and vehicles

Face Sketch Synthesis and Recognition Xiaogang Wang Department of Electronic Engineering The Chinese University of Hong Kong

Outline y Applications y CUHK face sketch database y Face sketch synthesis using a global linear model y Patch-based face sketch synthesis using multi-scale Markov

random fields y Face F sketch k t h recognition iti

Applications y Law enforcement y Film industry y Entertainment

Query sketch drawn by the artist Face photos Sketches in synthesized the police mug-shot by computer databases

CUHK Face Sketch Database (CUFS) y Publicly available:

http://mmlab.ie.cuhk.edu.hk/facesketch.html y 188 people from the CUHK student data set

CUHK Face Sketch Database (CUFS) y 123 people from AR database

CUHK Face Sketch Database (CUFS) y 295 people from XM2VTS database

L Learning i gB Based dF Face Sketch Sk t h S Synthesis th i y Generate a sketch from an input face photo based on a set of

training face photo-sketch pairs y Sketches of different styles can be synthesized by choosing

training i i sets off diff different styles l Face sketch synthesis Synthesize sketch

Input face photo

Training set

Face Sketch Synthesis Using a Global Gl b l Linear Li M Model d l

+

s1

…

transform

p1

…

c1

c1

+

? Synthesize sketch

Input photo

cn

cn

sn pn Training photo sketch pairs X. Tang and X. Wang, “Face Sketch Recognition,” IEEE Trans. on Circuits and Systems forVideo Technology (CSVT), Vol. 14, No. 1, pp. 50-57, 2004.

Results

Photo

Sketch drawn by the artist

Synthesize sketch

Photo


Synthesize sketch

Separate Shape and Texture Shape Transformation Sketch shape

Photo shape

Photo

Graph matching

Texture Transformation Photo texture

Synthesized sketch Sketch texture

X. Tang and X. Wang, “Face Sketch Synthesis and Recognition,” in Proceedings of IEEE International Conference on ComputerVision (ICCV), 2003.

Results Without separation

Separate shape & texture Photo

Synthesize sketch


Photo

Synthesize sketch


Photo

Synthesize sketch


Photo

Synthesize sketch


Photo

Synthesized sketch


Patch-Based Face Sketch Synthesis Using M lti S l M Multi-Scale Markov k R Random d Fields Fi ld

X. Wang and X. Tang, “Face Photo‐Sketch Synthesis and Recognition,” IEEE Trans. on Pattern Analysis and Machine Intelligence (PAMI), Vol. 31, pp. 1955-1967, 2009.

Results

Photo

After 0 iteration


After 5 iteration

After 40 iterations

Synthesized sketches after different number of iterations of belief propagation on Markov random fields

Photo

Global linear transform

Patch based


Photo

Synthesized Drawn by the artist

Photo


Photo


Photo


Sk t h Synthesis Sketch S th i with ith Lighting Li hti V Variations i ti

Sk t h Synthesis Sketch S th i with ith Pose P V Variations i ti

Synthesize Photos from Sketches

By artist Synthesized photo

Photo

By artist Synthesized photo

Photo

Face Sketch Recognition y 306 people for training and 300 people for testing

Table 1. 1 Rank 1 – 10 recognition accuracy using different face sketch synthesis methods (%) Methods

1

2

3

4

5

6

7

8

9

10

Direct match

6.3

8.0

9.0

9.3

11.3

13.3

14.0

14.0

14.3

16.0

Global linear transform

90.0

94.0

96.7

97.3

97.7

97.7

98.3

98.3

99.0

99.0

Patch based

96.3

97.7

98.0

98.3

98.7

98.7

99.3

99.3

99.7

99.7

Conclusion y Propose two face sketch synthesis approaches based on global

linear transform and Markov random fields y Face sketch synthesis by linear transform can be improved by separating shape and texture y Face sketch recognition can be significantly improved by first transformingg face photos p into sketches

y Image and Video Processing Lab, Department of Electronic

Engineering the Chinese University of Hong Kong Engineering, y Video surveillance y Medical imaging g g y Machine Learning

y Multimedia Lab, Department of Information Engineering,

the Chinese University of Hong y Face analysis y Image I searchh y Video editing y 3D reconstruction y…

y 多媒体集成实验室，中科院深圳先进技术研究院