The Heathrow airport in London has 5,000 surveillance cameras ... Hierarchical
Bayesian Models,” IEEE Trans. on Pattern Analysis and Machine Intelligence.
Intelligent Video Surveillance in Crowded Scenes Xiaogang Wang Department of Electronic Engineering The Chinese University of Hong Kong
Intelligent Video Surveillance y The number of surveillance cameras is fast increasing The Heathrow airport in London has 5,000 surveillance cameras By 2009, China has installed more than 3,000,000 surveillance cameras. This number
will increase more than 40% per year in the next five years.
y Applications Homeland security Anti-crime Traffic control Monitor M it children, hild elderly ld l andd patients ti t att hhome …
y Functions Low-level: detect, track and recognize objects of interest High-level: understand activities of objects and detect abnormalities
… Crowded scenes
Sparse scenes
Single g camera view
Multiple p camera views
“The requirements for the next generation of video surveillance y are robustness, reliability, y scalability y and selff systems adaptability for crowded, large and complex scenes. ” (Remagnino et al. Machine Vision and Applications, 2007)
Conventional Approaches for Activity Analysis y Detection and track based Detection and tracking are unreliable in crowded because of occlusions
Object detection and tracking
Trajectories off objects b Activity analysis
… Typical activity categories
Abnormal activities
Conventional Approaches for Activity Analysis y Motion based Cannot separate co-occurring activities
Walk
…
… …
…
… Video sequence
Wave Run
Motion feature vector
Divide into short video clips
Activity analysis
Features of our Approach y Detection and tracking are not required y Separate co-occurring activities y Work robustly in crowded scenes y Unsupervised (no need to manually label training data) y Simultaneously model simple activities, activities more complicated
interactions among objects and global behaviors in the scene
X. Wang, X. Ma and E. Grimson, “Unsupervised Activity Perception in Crowded and Complicated Scenes Using Hierarchical Bayesian Models,” IEEE Trans. on Pattern Analysis and Machine Intelligence (PAMI), Vol. 31, pp. 539-555, 2009.
High level picture of our approach
Motion Features ( ) (a)
Atomic activities modeled as distributions over the feature codebook ( ) (b) 0.3
0.5 0.45
0.25 0.4 0.35
0.2
0.3
0.15
0.25
Global behaviors modeled as distributions over atomic activities (c)
0.2
0.1
0.15 0.1
0.05 0.05 0
0
…
…
Parametric hierarchical Bayesian model μ
η
0.3 0.5 0.45
0.25
0.4 0.35
0.2
0.3 0.25
0.15
0.2 0.15
0.1
0.1 0.05 0
cj
0.05
β1
0
1
β2
2
Global behavior models (L = 2)
β0
βc
πj ∞
Global Gl b l behavior b h i models
z ji
φ1
φ2
φ3
φ4
1
2
3
4
Atomic activity models (K=4)
H
φk ∞ Atomic activity models
x ji …
Nj M
Observed feature values of moving pixels
πj Video clip j (j = 1…M)
…
Temporal co-occurrence of moving pixels
Moving pixels in a short video clip
Spatial distribution of an atomic activity
Experimental Results y The input is a 90 minutes long traffic video sequence
Learned atomic activities from a traffic scene
(1)
(2)
(3)
(4)
(5)
(6)
(7)
(8)
(9)
(10)
(11)
(12)
(13)
(14)
(15)
(16)
(17)
(18)
(19)
(20)
(21)
(22)
(23)
(24)
(25)
(26)
(27)
(28)
(29)
Global behavior I: green light for south/north traffic
prior
index of atomic activities
vehicles northbound
vehicles northbound
vehicles southbound
vehicles incoming northbound
vehicles incoming southbound
vehicles outgoing eastbound
Global behavior II: green light for east/west traffic
prior
index of atomic activities
vehicles incoming westbound
vehicles outgoing westbound
vehicles outgoing southbound
vehicles incoming eastbound
vehicles outgoing eastbound
pedestrians westbound
Global behavior III: left turn signal for east/west traffic
prior
index of atomic activities
vehicles turning left eastbound
vehicles outgoing northbound
vehicles outgoing northbound
vehicles incoming eastbound
vehicles outgoing eastbound
vehicles stopping southbound
Global behavior IV: walk sign
prior
index of atomic activities
pedestrians incoming eastbound
pedestrians outgoing eastbound
pedestrians westbound
pedestrians westbound
vehicles stopping
vehicles stopping
Global behavior V: northbound right turns
prior
index of atomic activities
vehicles incoming northbound
vehicles outgoing eastbound
Temporal video segmentation
green light for east/west traffic
walk sign
green light for south/north traffic
northbound right turns
left turn signal for east/west traffic
Confusion matrix of video segmentation Clustering result
Manual label
The average accuracy is 85.74% using our approach.
The average accuracy is 65.6% when modeling atomic activities and global behaviors in two separate steps. The approaches, such as Zhong et al CVPR’04, of using a motion feature vector t representt a video to id clip li perform f poorly l on thi this ddata. t
Abnormality detection results
Interaction query
vehicles approaching 0.5 0.45 0.4 0.35 0.3 0.25 0.2 0.15
pedestrians crossing the street
0.1 0.05 0 1
3
5
7
9
11
13
15
17
19
21
23
Query distribution
25
27
29
Top four retrieved jay-walking examples
Pedestrians/Vehicles Detection Based on Motions
Atomic activities related to vehicles
Pedestrians/Vehicles Detection Based on Motions
Atomic activities related to pedestrians
Classify Motions into Vehicles (Red) and Pedestrians (Green)
Conclusion y Propose an unsupervised approach for robust activity analysis
in crowded scenes y Co-occurring activities are separated without detecting and tracking objects y Only using moving pixels as features, this approach is able to ¾ detect activities ¾ analyze interactions among objects ¾ temporally segment video sequences into global behaviors ¾ detect abnormalities ¾ classify motions into pedestrians and vehicles
Face Sketch Synthesis and Recognition Xiaogang Wang Department of Electronic Engineering The Chinese University of Hong Kong
Outline y Applications y CUHK face sketch database y Face sketch synthesis using a global linear model y Patch-based face sketch synthesis using multi-scale Markov
random fields y Face F sketch k t h recognition iti
Applications y Law enforcement y Film industry y Entertainment
Query sketch drawn by the artist Face photos Sketches in synthesized the police mug-shot by computer databases
CUHK Face Sketch Database (CUFS) y Publicly available:
http://mmlab.ie.cuhk.edu.hk/facesketch.html y 188 people from the CUHK student data set
CUHK Face Sketch Database (CUFS) y 123 people from AR database
CUHK Face Sketch Database (CUFS) y 295 people from XM2VTS database
L Learning i gB Based dF Face Sketch Sk t h S Synthesis th i y Generate a sketch from an input face photo based on a set of
training face photo-sketch pairs y Sketches of different styles can be synthesized by choosing
training i i sets off diff different styles l Face sketch synthesis Synthesize sketch
Input face photo
Training set
Face Sketch Synthesis Using a Global Gl b l Linear Li M Model d l
+
s1
…
transform
p1
…
c1
c1
+
? Synthesize sketch
Input photo
cn
cn
sn pn Training photo sketch pairs X. Tang and X. Wang, “Face Sketch Recognition,” IEEE Trans. on Circuits and Systems forVideo Technology (CSVT), Vol. 14, No. 1, pp. 50-57, 2004.
Results
Photo
Sketch drawn by the artist
Synthesize sketch
Photo
Sketch drawn by the artist
Synthesize sketch
Separate Shape and Texture Shape Transformation Sketch shape
Photo shape
Photo
Graph matching
Texture Transformation Photo texture
Synthesized sketch Sketch texture
X. Tang and X. Wang, “Face Sketch Synthesis and Recognition,” in Proceedings of IEEE International Conference on ComputerVision (ICCV), 2003.
Results Without separation
Separate shape & texture Photo
Synthesize sketch
Sketch drawn by the artist
Photo
Synthesize sketch
Sketch drawn by the artist
Photo
Synthesize sketch
Sketch drawn by the artist
Photo
Synthesize sketch
Sketch drawn by the artist
Photo
Synthesized sketch
Sketch drawn by the artist
Patch-Based Face Sketch Synthesis Using M lti S l M Multi-Scale Markov k R Random d Fields Fi ld
X. Wang and X. Tang, “Face Photo‐Sketch Synthesis and Recognition,” IEEE Trans. on Pattern Analysis and Machine Intelligence (PAMI), Vol. 31, pp. 1955-1967, 2009.
Results
Photo
After 0 iteration
Sketch drawn by the artist
After 5 iteration
After 40 iterations
Synthesized sketches after different number of iterations of belief propagation on Markov random fields
Photo
Global linear transform
Patch based
Sketch drawn by the artist
Photo
Synthesized Drawn by the artist
Photo
Synthesized Drawn by the artist
Photo
Synthesized Drawn by the artist
Photo
Synthesized Drawn by the artist
Sk t h Synthesis Sketch S th i with ith Lighting Li hti V Variations i ti
Sk t h Synthesis Sketch S th i with ith Pose P V Variations i ti
Synthesize Photos from Sketches
By artist Synthesized photo
Photo
By artist Synthesized photo
Photo
Face Sketch Recognition y 306 people for training and 300 people for testing
Table 1. 1 Rank 1 – 10 recognition accuracy using different face sketch synthesis methods (%) Methods
1
2
3
4
5
6
7
8
9
10
Direct match
6.3
8.0
9.0
9.3
11.3
13.3
14.0
14.0
14.3
16.0
Global linear transform
90.0
94.0
96.7
97.3
97.7
97.7
98.3
98.3
99.0
99.0
Patch based
96.3
97.7
98.0
98.3
98.7
98.7
99.3
99.3
99.7
99.7
Conclusion y Propose two face sketch synthesis approaches based on global
linear transform and Markov random fields y Face sketch synthesis by linear transform can be improved by separating shape and texture y Face sketch recognition can be significantly improved by first transformingg face photos p into sketches
y Image and Video Processing Lab, Department of Electronic
Engineering the Chinese University of Hong Kong Engineering, y Video surveillance y Medical imaging g g y Machine Learning
y Multimedia Lab, Department of Information Engineering,
the Chinese University of Hong y Face analysis y Image I searchh y Video editing y 3D reconstruction y…
y 多媒体集成实验室,中科院深圳先进技术研究院