Fast-BoW: Scaling Bag-of-Visual-Words Generation

Fast-BoW: Scaling Bag-of-Visual-Words Generation Dinesh Singh, Abhijeet Bhure, Sumit Mamtani, and C. Krishna Mohan Visual Learning and Intelligence Group (VIGIL) Department of Computer Science and Engineering Indian Institute of Technology Hyderabad, Kandi, Sangareddy-502285, India Email: {cs14resch11003, cs15btech11001, cs15btech11022, ckm}@iith.ac.in

Motivation

where, η =

3k log3 m m

1 2

vector quantization for vocabulary generation Frequency histogram generation

Overview of the Proposed Work • Present a framework for automatic detection of

• • •

•

motorcyclists driving without helmets in surveillance videos. It uses adaptive background subtraction on video frames to get moving objects. Convolutional neural network (CNN) is used to select motorcyclists among the moving objects. Again, we apply CNN on upper one fourth part for further recognition of motorcyclists driving without a helmet. The performance of the proposed approach is evaluated on two datasets.

Challenges • Real-time implementation • Occlusion & Direction of motion • Temporal changes in weather conditions • Quality of video feed [1].

gives the best results. e

p(yi = j|fi; θ) = Pk

• Due to advances in the content generation and

sharing techniques, a large amount of visual data is available which can be exploited for the variety of applications such as content-based retrieval, classification, action/activity recognition, etc. • The bag-of-visual-words (BoW) is an important task for unsupervised representation of visual data based on the local feature descriptors [?]. • The BoW generation process involves two phases,

(1)

µij = (1 − η)µij − 1 + ηfi, θTj fi

j=1 e

(2)

θTj fi

The θ is learned by maximization of the class cross enTable 2: Average Performance (%) of the classifications tropy. The object function for the softmax classifier along over 5 fold cross validation. DataSet:Feature ‘Motorcyclists’ vs. ‘Non-motorcyclists’ ‘Helmet’ vs ‘Without helmet’ with regularization is IITH_Helmet_1:CNN 99.24 98.63 T n X k k X d IITH_Helmet_1:HOG 98.88 93.80 1 X λX eθj fi 2 IITH_Helmet_2:CNN 91.81 87.11 J(θ) = − I(j = yi) log Pk θT f + θ(i, j) IITH_Helmet_2:HOG 81.84 57.78 i j n i=1 j=1 2 i=1 j=1 j=1 e (3) The loss function in Equation (3) is solved using scaled (A) Motorcyclists (B) Helmet conjugate gradient[?]. 





         

         

         

Weight Quantization and Hashing Definition

(C) Motorcyclists (D) Helmet

Figure 4: 2D visualization of spread of the exThe quantization function q : R → Z on a real scalar parametertracted θ ∈ R features is definedusing as t-SNE on IITH_Helmet_1 and IITH_Helmet_2 θ z =q(θ) = ×L , max(abs(θ))          

         

Definition The hash function h : Z → Z∗ on a integer key z ∈ Z is defined as (A) Motorcyclists vs Others. (B) Helmet vs Non-helmet. h(z) = z + L, Figure 5: Experimental results

Implementation Details • Ubuntu 16.04 Xenial Xerus, Python-2.7.12,

OpenCV-3.0., Keras-1.1.1 , Theano-0.8.2. The architecture for the CNN is same as used in [3] for CIFAR dataset.

Dataset Table 1: Details of the datasets used Dataset Classes Train Test Avg.Length Videos Videos (Desc.) KTH 6 383 216 4 sec. (849) HMDB51 51 3,567 1,530 5 sec. (1456) UCF101 101 9,535 3,782 7.21 sec. (1574)

Proposed Fast-BoW Result and discussions

Summary and conclusions • The proposed framework will also assist the traffic

police for detecting such violators in odd environmental conditions viz; hot sun, etc. • The experiments on real videos successfully detect 92.87% violators with a low false alarm rate of 0.5% on an average and thus shows the efficacy of the proposed approach. • This framework can be extended for detection of other rule violations as well as to detect and report number plates of violators.

References [1] K. Dahiya, D. Singh, and C. K. Mohan, Automatic detection of bike-riders without helmet using surveillance videos in real-time, IJCNN 2016. [2] B. Duan, W. Liu, P. Fu, C. Yang, X. Wen, and H. Yuan, Real-time on-road vehicle and motorcycle detection using a single camera IEEE Int. Conf. on Industrial Technology 2009

Figure 1: Block diagram

Learning Probability Distribution of the Clusters

Figure 3: Visualization of the trained representation by CNN for with-helmet vs. without-helmet

Figure 2: Visualization of the trained representation by CNN for motorcycle vs. not-motorcycle

[3] Alex Krizhevsky and Ilya Sutskever and Geoffrey E. Hinton, ImageNet Classification with Deep Convolutional Neural Networks NIPS 2012.

29th British Machine Vision Conference (BMVC), Newcastle upon Tyne, United Kingdam, Sep 3–6, 2018.