In this work, we propose an unsupervised approach for crowd scene anomaly detection and localization using a social network model. Using a window-based ...
Social Network Model for Crowd Anomaly Detection and Localization
Rima Chaker
Zaher Al Aghbari
Imran N. Junejo
{rkhalid | zaher | ijunejo}@sharjah.ac.ae
Abstract In this work, we propose an unsupervised approach for crowd scene anomaly detection and localization using a social network model. Using a window-based approach, a video scene is first partitioned at spatial and temporal levels, and a set of spatio-temporal cuboids is constructed. Objects exhibiting scene dynamics are detected and the crowd behavior in each cuboid is modeled using local social networks (LSN). From these local social networks, a global social network (GSN) is built for the current window to represent the global behavior of the scene. As the scene evolves with time, the global social network is updated accordingly using LSNs, to detect and localize abnormal behaviors. We demonstrate the effectiveness of the proposed Social Network Model (SNM) approach on a set of benchmark crowd analysis video sequences. The experimental results reveal that the proposed method outperforms the majority, if not all, of the state-ofthe-art methods in terms of accuracy of anomaly detection. Keywords: crowd modeling, social network model, crowd analysis, anomaly detecting, anomaly localization, scene understanding, and video surveillance.
1. Introduction Crowd is defined as a collection of large number of people in a confined space. Socio-psychological studies [49] [50] have shown that people in a crowd tend to walk in groups, thus forming collective entities[31] each of which has a specific goal and similar characteristics like speed and trajectory. Early detection, or prediction, of abnormal behaviors occurring in surveillance scenario scenes is of utmost significance. By alerting human operators, potential dangerous consequences can be reduced, or prevented. However, the analysis of crowded scenes is a very challenging task, due to the fact that the analysis of human actions is still not a fully solved problem. The significance of understanding crowd scenes is due to its potential in applications such as crowd management [41], video surveillance [3], public space design [2], etc. Recently, crowd motion segmentation [42][5], crowd density estimation [7][8], and identifying individuals’ behavioral goals within a crowd [6], have all been subject of active research from different disciplines. This problem presents challenges of great complexity due to: (1) occlusion between individual objects, (2) random variations in the density of people over time, (3) low resolution videos with dynamic background, and (4) the inherent difficulty in accurately modeling the crowd behavior. What is needed is an automatic systems for analyzing crowd scenes and alerting human operators once anomalous activities are detected so that dangerous situations can be prevented. Anomaly detection refers to modeling the normal scene behavior and then to detect the behavior that does not confirm to it. Thus, behavior patterns that appear frequently, are referred to as normal behaviors and those appearing rarely are referred to as abnormal behaviors. In [10], anomaly detection is broadly classified into two types, namely local and global. Local abnormal behavior corresponds to the behavior of a group of objects in a localized region that is different from that of their neighbors in spatio-temporal terms [16]. On the other hand, global abnormal behavior corresponds to the abnormal behavior of a group of
Figure 1: A typical scenario (anomalies circled in red). (a) The region of instability flow of Pilgrims circling around kabba is detected. (b) Sample frame of anomaly detected (bicycle) in the UCSD dataset.
objects in the whole scene. The key to accurate detection of abnormal behavior is the selection of an appropriate model that properly models both the local and the global behavior. Figure 1-(a) denotes a typical scenario. The red circle represents the detected region of instable flow around Kabba in Mecca. Another example is illustrated in Figure 1-(b). The appearance of the bicyclist, circled in red, represents an anomaly with respect to the overall behavior of its surrounding neighbors. In this paper, we aim at detecting local and global abnormal behaviors in crowd scenes using a social network model: a data structure consisting of nodes and links between the nodes. In the crowd scene context, nodes can represent people and links reflect the social relationship among the people. First, the unsupervised approach extracts dense tracklets from the crowd motion data in a scene. Second, the video scene is partitioned at spatial and temporal levels; as a result, a set of spatio-temporal cuboids are constructed. The granularity of scene partitioning is proportional to the crowd density. Third, we cluster the objects in each cuboid based on the unique features of their tracklets, such as velocity, curvature, direction, etc., to build the local social networks, which model the objects’ local behavior. Fourth, for each of the subsequent time windows, the global social network is updated incrementally using its local social networks and the previous window’s global social network. By analyzing these social networks (local and global), a normal, or dominant, behavior and abnormal behavior can be identified. An earlier version of this work appeared in [51].
2. Related Work Crowd behavior analysis comprises of motion information extraction and behavior modeling. The model is then used to distinguish between normal and abnormal behavior. Basharat et al. [5] use object tracking[43] to detect unusual events in image sequences. Similarly, Ali et al. [12,13] track subjects in high density crowd scenes that are captured from a distance. They learn the direction of motion as a prior information based on a force model (floor fields). However, their method requires a manual selection of individuals to be tracked in the crowd which hinders automatic unexpected behavior recognition. Also, floor fields is chaotic in crowded scenes as they result in highly inconsistent trajectories. For motion modeling, features, such as optical flow [11], tracklets [26], or Mixture of Dynamic Textures [16], are extracted at the pixel level. Different models are then built to solve the perplexities of occlusion and clutter. These models include Gaussian Mixture Model [21], Social Force Model [10], etc. For example, Mehran et al.[10] explore the socio-psychological concept “social force” in combination with optical flow to compute interaction forces that are later combined with Latent Dirichlet Allocation to model normal behaviors and detect abnormal ones. This method is further extended in [11] using Particle Swarm Optimization, in addition to social force model, to optimize the computed interaction force and thus detect global abnormal activities. Ali and Shah [13] utilize the idea of coherent structures in fluid dynamics for
Figure 2: Scenario of detecting anomalies using social network model: objects are detected and tracked. A spatio-temporal partitioning is constructed, producing a set of spatio-temporal cuboids that capture spatial and temporal features. A hierarchical social network is built to model crowd behavior. At the bottom-level of this hierarchical network, a spatial clustering is applied on each cuboid to detect local anomalies in its local social network. Moving up the hierarchical social network, a hierarchical clustering approach is employed to build the global social network. A temporal clustering is then applied on the global social network to detect global anomalies in each time window. An on-line mechanism is applied to update the global social network, for any subsequent time windows.
segmenting dominant crowd flows and flow instability detection. Gaidon et al. [38] structure a video as a tree of nested motion components composed of short duration point trajectories, tracklets. Chongjing et al. [26] analyze motion patterns by clustering the extracted tracklets in a dynamic crowd scenes. [46] use spatiotemporal Laplacian eigenmap to extract different crowd activities from videos. Despite the many different representations of video events, many of the existing works ignore the importance of “contextual” anomaly in the field of crowd analysis. Contextual anomaly arises when an individual behavior exhibits behavior similar to others but it is anomalous in a specific context (e.g. neighborhood) [15]. Jiang et al. [15] focus on detecting contextual anomalies in the context of motion using statistical analysis. Leach et al. [18] detect subtle context-dependent behavioral anomalies based on contextual information. Beside the motion information, other works include important object features such as appearance or size. Mahadevan et al. [16] apply Mixture of Dynamic Textures (MDT) to jointly model the appearance and dynamics of crowded scenes. Their approach investigates both temporal and spatial abnormalities. Due to the reported heavy computational cost of [16], Reddy et al.[17] propose a more robust anomaly detection algorithm with relatively low complexity, while analyzing the size, motion and texture. An important aspect in crowd behavior analysis is event/behavior recognition. Regular motion patterns such as direction and speed [24,25,40] can be used to estimate the behavior of a crowd in a given environment. A deviant behavior from the normal behavior is considered abnormal behavior. Two types of approaches are commonly used: object-based approach and holistic-based approach [10]. In object-based
approaches, the crowd is considered as a collection of individuals. Ozturk et al. [24] propose an approach for clustering a set of flow vectors into local dominant motion flows. The local dominant motion flows are later combined to determine the global dominant motion flows in a crowd scene. In holistic-based approaches, a crowd, or a portion of a crowd, is treated as a single entity to estimate the regular and abnormal motions. For example, Mehran et al. [10] explored the social force model, which is based on socio-psychological studies, to model the behavior of a crowd. Anomaly Detection Techniques: To ensure public safety, the main objective of crowd analysis involves modeling the crowd dynamics and the detection of video anomalies in the scene. However, detecting anomalies in crowd scenes is a challenging task due to the followings [1][2]:
The large number of moving objects in crowd scenes easily weakens the local anomaly detector. It is difficult to model the abnormal events, as they are rare and last for a short period of time. It is difficult to obtain a training dataset that covers every possible normal behavior.
[48] propose an informative structural context descriptor (SCD), in addition to the 3-D discrete cosine transform (DCT), for describing the crowd individual, Ullah et al.[20], Mehran et al. [10] and Cui et al. [22] detect abnormal events in scenes of escape panics. Ullah et al. [20] initialized a fixed grid of particles that extracted the crowd motion features, and Gaussian Mixture Model [27] was adopted to learn the crowd behavior. The closest works to the proposed method are [10], [22] in terms of considering people social behaviors. Mehran et al.[10] attempt to detect abnormal events with a social force model. A bag of words method and a Latent Dirichlet Allocation are exploited to discriminate between normal and abnormal frames. Abnormal areas are localized as those representing higher force magnitudes. Cui et al. [22] propose interaction energy potentials to model group activities based on social behavior analysis and finally detect escape panic behavior in crowd. Saligrama et at. [19] categorize approaches of detecting abnormal behavior in crowd scenes into two types: local abnormal event (LAE) or global abnormal event (GAE). In LAE, most of the state-of-the-art methods extract motion or appearance features from local patches such as Mahadevan et al. [16]. For the GAE, Mehran et al. [10] detect abnormal crowd behavior by adopting the social force model and then using the Latent Dirichlet Allocation to discriminate abnormal frames from the normal ones. The above methods are often computationally expensive [20]. We proposed a simple yet robust approach where motion features are extracted from corner features by repeatedly generating features-to-track over a temporal window using KLT (Kanade-Lucas-Tomasi) [28,39]. In addition, our method is applicationindependent for detecting abnormal behaviors from different application videos. The proposed method not only detects anomalous events accurately, but also adapts itself to both spatial and temporal changes witnessed in the environment over time. The overview of the proposed method is shown in Figure 2.
3. Scene Modeling with Social Networks Given a set of objects in a crowd scene , where N is the number of objects, ∈ , , … , is a feature vector, based on spatial and temporal characteristics, describing an individual object and d is the feature dimensionality. In order to capture the dynamics of the crowd, we extract motion tracklets [24, 25], using the KLT keypoint tracker [39]. A tracklet, , is a fragment of a long trajectory tracked across a small number of frames. Their short duration limits drifting problems i.e. trajectories deviating from the underlying tracked object. 3.1 Similarity Features In order to group tracklets that exhibit similar behavior, we focus on selecting the features that account for the (i) direction and magnitude of the motion, (ii) distance between the moving objects, and (iii)
different motion curvature of the object. Thus, we use the following measures: Cosine Similarity: Let ∅ , ∅ denote the dominant directions of tracklets the cosine similarity is defined as [36]: ∅
,
∥ ∅
∅
.∅
∥ . ∥∅
∥
1 ‐
|
(1)
denote the magnitudes (i.e. the distance between the first and the last and respectively, the magnitude similarity is defined as,
|
(2)
,
Combining both similarity measures measure
,
, ∅
,
and
linearly produces a weighted similarity
[36]as: ,
where
respectively. Thus,
1 .
Magnitude Similarity: Let , spatial coordinates) of tracklets ,
and
.
,
1
∅
,
.
with 0
1,
(3)
is the parameter that balances the effect of direction and magnitude of the two tracklets.
Velocity Similarity Measure: Velocity is computed for each tracklet and Dynamic Time Warping (DTW) , is used to measure the velocity similarity between two tracklets and . We use the following , : local distance measure
,
,
,
(4)
,
and represents velocity distance between the two tracklets along the x-axis and where , , and represents the standard deviation parameter in x-velocity and y-axis, respectively. The parameter y-velocity respectively. Spatio-Temporal Curvature Similarity Measure: This measure capturing the discontinuity in velocity, acceleration and position of an object, is given by:
,
(5)
are the and components of where and are the and components of the velocity and ′ and , the acceleration. This measure, denoted by , is computed using DTW by using the following local distance measure: ,
,
(6)
,
where , represents curvature distance between tracklets deviation parameter in spatio-temporal curvature.
and
, and
represents the standard
The similarity measures defined above are used by the proposed method (SNM) to cover the following cases:
Figure 3: Spatio-temporal cuboids at various spatial and temporal scales (represented by the upper arrows). Scale representation scheme is performed (represented by the down arrows).
Cosine Similarity: This covers tracklets with zero Euclidean distance, but moving in different directions. They are considered dissimilar by SNM.
Magnitude Similarity: This applied to tracklets moving in the same direction but have different lengths. A short tracklet is not considered similar to a long tracklet.
Velocity Similarity: Spatially dis-similar tracklets moving in the same direction and having almost equal lengths are not considered similar if they exhibit different motion behavior.
Spatio-Temporal Curvature Similarity: Tracklets similar in all above defined measures but with different curvatures are considered dissimilar. Now we are able to give the definition of our two social similarity measure between two tracklets and :
Definition 1 (Velocity based Social Similarity Measure) Let ,
and denote velocity similarity between the two tracklets between and is defined as, ,
= .
,
1
.
,
,
denote direction-magnitude similarity
and , then the social similarity measure
, with 0
(7)
1
where is a parameter that balances the effect of direction and magnitude on one hand and the velocities of the two tracklets on the other. ,
Definition 2 (Curvature based Social Similarity Measure) Let ,
denote direction-magnitude
similarity and denote spatio-temporal curvature similarity between the two tracklets social similarity measure between and is defined as, ,
= .
,
1
.
,
, with 0
1
and , the
(8)
Figure 4: The procedure of producing LSN per cuboid. (a) We partition the current time window cuboids
,
,…,
into
, using the spatio-temporal partitioning approach. Next, determine the
tracklets within each processed cuboid . Tracklets in cuboid are colored differently for clarification. (b) Symmetric adjacency matrix of tracklet nodes’ similarity weights. (c) Connected tracklet nodes make up a local social network component, represented by its average-feature centroid (represented by black dot).
where is a parameter that balances the effect of direction and magnitude on one hand and the spatiotemporal curvatures of the two tracklets on the other. The above two measures capture different behavior of the scene. As we shall show, one of these measures might be more appropriate for a certain crowd scene than the other – depending on the applications and the scene dynamics. Thus our social similarity measures are flexible and work with different features depending on the nature of the video.
3.2 Spatio-Temporal Partitioning Inspired by the multi-resolution approaches, we sub-divide the input videos into smaller regions. This spatio-temporal partitioning is performed at various spatial and temporal scales producing a unique set of spatio-temporal volumes. We refer to an individual spatio-temporal volume as a cuboid , , where 1 , is the number of rows and columns respectively of the spatio-temporal partitions within a window Ω . Each 3D spatio-temporal cuboid in a video is of size nx x ny x nf, in which nx x ny is the spatial dimensions of the cuboid and nf is the depth (or the number of frames). Each cuboid consists of the tracklets found within its dimensions. Therefore, the whole tracklet can belong to one or more cuboids at time window Ω . Depending on the dataset and the crowd dynamics, spatial blocks may range from 2 x 2 to m x m cuboids and temporal window of f frames. We observed that shorter duration (< 50 frames) yields erroneous tracklets due to motion blur and self-occlusions; therefore, in our experiments we set f to 50. Figure 3 illustrates the construction of the video hierarchy forming spatio-temporal cuboids at various levels i.e. 2 x 2, 4 x 4 or 8 x 8: the higher the density of the crowd, the higher the granularity of the partitioning to capture the details of the scene dynamics (illustrated by the right-to-left arrows in Figure 3).
3.3 Building Social Networks A social network is represented as a graph [30] where nodes represent objects and edges represent social interactions between people [29]. That is, each tracklet is represented by a node in the social network model, and the edge between two nodes represent the social interaction between these two nodes. The social interaction weights are based on our social similarity weight measure Equation (7) or Equation (8). On a graph, the geodesic between two nodes is a path connecting the nodes with the smallest number of edges. Since similar behaving tracklets need to be spatially close to each other, in addition to the social similarity measure, we use the closeness centrality among connected nodes representing tracklets for pruning only. The closeness centrality is defined as (the inverse of) the average distance to all other nodes[44]. If similar nodes are spatially distant (greater than a threshold ), their connecting edge is deleted. This is then followed by applying the connected component algorithm [35] to the whole network to find the connected components of the social network in each cuboid. Each extracted connected component is considered as a cluster – denoted as the local social network (LSN). The aim is to identify the different dynamics of the scene, represented by the clusters in the network. 3.3.1
Building Local Social Networks (LSN)
The cluster obtained above is denoted by its centroid , computed as a mean of the spatial ( , direction (∅ , magnitude ( , velocity and/or curvature (κ features of the tracklets belonging to , a cluster :
, ∅ ̅ ̅ κ
(9)
By finding the connected components, as defined above, we end up with a number of cluster(s)
. Figure 4 shows an example of ∁ within each cuboid - referred to as local social network, are colored differently for processing one cuboid (shaded in red). The six extracted tracklets in clarification. Also a node is colored by its tracklet color for ease of referencing. Algorithm 1uses a threshold on the computed social similarity measure between two tracklets and ; and a threshold ( ) on the computed closeness centrality measure between tracklets. The results . This adjacency matrix of non-zero are stored in a symmetrical similarity adjacency matrix A ∈ value represents the weights of similarity among tracklet nodes, where zero indicates dissimilarity and one
indicates highest similarity. Finally, for each component of a local social network,
, its
, is computed (represented by black dot, Figure 4 – bottom right). Hence, the algorithm centroid including the corresponding centroid(s). outputs the local social network component(s) per cuboid
3.3.2
Building Global Social Networks (GSN)
GSN gives a general view of the activities occurring in a time window Ω . The Hierarchal Agglomerative Clustering (HAC) [34] is applied to merge similar from different cuboids in a time window into a global social network, , in a hierarchal fashion. Merging two components of the local
social networks (say
between their centroid, i.e.
and
) is based on the social similarity – Equation (7) or Equation (8) –
and
and is above the threshold ( , then network and its new centroid is computed.
, respectively. That is, if the social similarity value between and
are merged together to make a bigger social
This process continues up the hierarchy until no more merging is possible. The resulting global social that may consist of one or more components. This bottom-up approach, as network is considered a shown in Figure 5, aims to merge similar LSNs from different cuboids and finally discover the global social within time window Ω . network from This is shown in GlobalSocialNetwork algorithm that takes as input the local social network all cuboids ς within time window Ω including the representative centroid of each . The results of LSN , as an to-LSN comparison are stored in a symmetrical similarity adjacency matrix ∈ undirected graph. Adjacency matrix of non-zero value represents the weight of similarity among LSN components and zero indicates dissimilarity.
3.4 Anomaly Detection The social similarity measure and the size of social network are essential for detecting abnormal behavior. The social similarity measure separates the rare actions from the dominant ones. That is the resultant social network(s) with very few nodes is denoted as deviant behavior from the other dominant social network(s). Thus, isolated and small (few nodes) social networks are marked as anomaly.
is less If the relative local size, , of the tested local social network component in , than ts, where ts is the ratio of LSNi to the largest local social network components where 1 , then LSNi is classified as an anomaly. ts is set to 0.5 in our experiments:
,1
(10)
Table 1 shows an example of window Ω consisting of 50 frames partitioned into 2 x 2 spatiotemporal cuboids. On processing cuboid , for instance, it produces seven LSN components, of which four are normal and the other three are abnormal. As show in the Anomaly Detection Algorithm, the abnormality classification is based on the social similarity measure, Equation(7) or Equation (8), followed by the size of a LSN relative to the largest local social network component within the cuboid - Equation (10). Once the anomalous LSN is identified, the localization is simply determined by using the spatial feature of , the tracklet members in the anomalous LSN. The social similarity measure isolates the anomalous components from the normal components. Then, to identify those anomalous components, the size feature is used (see Anomaly Detection Algorithm).
Within each window Ω , global anomalies at the top-level of the hierarchy are identified using , of a target GSN component size Equation 11: if the relative global size, in Ω , where 1 tg is the ratio of GSNj to the largest GSN
is less than tg, where , then
is classified as an anomalous:
,1
(11)
An example of global anomaly detection is shown in Table 2. By using the hierarchal partitioning scheme, we can zoom in to finer details of the crowd behavior, which increases the efficiency of detecting and localizing anomalies, especially local anomalies. Also, as we move up the hierarchy level, certain tracklet nodes classified as abnormal in a lower level LSN component(s), might be merged with other nodes in a higher level normal LSN component(s) and vice versa.
3.5 GSN-Update The proposed hierarchical model maintains a link between local and global social networks. In this phase, we seek to learn any newly observed events, and in turn update the global social network , . implicitly gaining any changes in the bottom-level of the hierarchy i.e. local social network The process of GSN-Update performs as follows (illustrated in Figure 6): For every two successive windows, cluster centroid algorithm is employed, Equation (9), instead of tracklet-to-tracklet comparison that demands a high computational time. GSN components of current window are merged with previous components windows, i.e. Ω and Ω , therefore the corresponding GSNs are compared. The similarity comparison corresponds to only if their centroids exhibit features similarity. Non-matching GSN components from windows Ω and Ω , are dealt with as follows: a.
Non-matching global social network components(s) that belong to the recently processed time window Ω are destroyed. b. Non-matching global social network components(s) that belong to current time window Ω are preserved.
Figure 5: Constructing GSN by hierarchically grouping similar LSNs components from different cuboids. (a) Once we obtain the local social network components , the hierarchical clustering algorithm is employed to have a coarser view of the scene. (b) The bottom-up approach, will aim to merge similar local social network components from different cuboids towards discovering the global social network within time window .
As an example, Figure 6. shows three windows Ω , Ω and Ω . GSN-Update on windows Ω and Ω , merges similar GSNs i.e. the red
, ,
and blue
, ,
, respectively. As
satisfies condition
(a) above, it is destroyed. As satisfies condition (b) above, and is preserved. The result of GSNUpdate, i.e. between window Ω and window Ω , is used as the input for the successive windows in the GSNUpdate process, i.e. window Ω , and so on.
Table 1: Example of local anomaly detection. The time window is partitioned into 2 x 2, cuboid enclosed in red, produces 7 local social network components, in which 4 are classified as normal social network components and 3 exhibit abnormal behaviors.
Cuboid
in Time Window Ω
Analysis of Local Social Network LSN Component No. LSN1 LSN2 LSN3 LSN4 LSN5 LSN6 LSN7
Dominant Feature(s)
No. of Tracklets
Direction & Magnitude Magnitude Direction & Magnitude Direction & Magnitude Direction & Magnitude Direction & Magnitude Direction & Magnitude & Velocity
Local Anomaly -
1 24 29 15 14 8
Yes No No No No Yes
7
Yes
Table 2: Example of global anomaly detection. Window 1 produces 3 global social network components: colored in green and colored in yellow. Out of the three global social network colored in red, components, 2 are classified as normal social network components and 3 exhibit abnormal activities. Time Window Ω
Analysis of Global Social Network GSN Component No.
Global Anomaly -
Dominant Feature(s)
No. of Tracklets
GSN1
Direction
345
No
GSN2
Direction & Velocity
21
Yes
Direction
274
No
Figure 6: GSN-Update on windows , and . Global social network components are labeled with the same window index. Similar global social network components from different windows contain the same network shape and color. Similar global social network components are merged together as shown in level 1. The red global social network components and blue global social network components are merged, respectively. Non-existing global social network component in recently processed window,
, is destroyed. Newly non-matching global social
network components in current window are preserved, . The result of GSN-Update between window and window is used as the input for the successive windows, , in the GSN-Update process.
4. Experiments & Results Our proposed method run all the experiments on a PC computer with an Intel(R) Core(TM) i5 3.10GHz CPU and 4GB RAM under the MATLAB implementation. We have used publicly available datasets: UCSD Dataset: The UCSD anomaly detection dataset1 uses an elevated stationary camera and overlooks pedestrian walkways on UCSD campus. The dataset represents a real scene and the abnormalities occur naturally containing videos of two different pedestrian scenes, namely USCD Ped1: containing groups of people walking towards and away from the camera with some amount of perspective distortion; and UCSD Ped2: containing groups of people walking in parallel to the camera plane. The crowd density in the walkways was variable, ranging from sparse to crowded. The normal events contain only pedestrians. The abnormal events are due to either: 1) the appearance of non-pedestrian entities in the walkways, and/or 2) anomalous pedestrian motion patterns. Commonly occurring anomalies include small carts in the scene, skaters, bikes, and people in wheelchairs. The UCSD dataset contains both frame-level ground-truth and pixel-level ground-truth.
1
http://www.svcl.ucsd.edu/projects/anomaly
UCD Dataset: The UCD dataset2 contains two outdoor videos of students moving across two buildings lasting for 12 and 5 minutes, respectively. Each sequence is segmented into two different subsequences with people mainly moving in a horizontal direction in the scene. This dataset defines anomaly as the deviations from what has been observed beforehand. The groundtruth consists of the number of frames in the scene when someone starts moving against the dominant crowd motion. In our experiments, of Equation 7 and of Equation 8 are determined experimentally to be 0.4 and 0.8, respectively.
4.1 Performance Evaluation For both local and global scene understanding and anomaly detection we use [16]: a frame-level criterion - a frame is considered an anomaly if it contains at least one abnormal pixel, and denoted as positive; and the pixel-level criterion - a frame is considered anomaly if (i) it is positive and (ii) at least 40 percent of its anomalous pixels are truly identified. For GSN evaluation, the Receiver Operating Characteristic (ROC) curve is computed and the Area Under the Curve (AUC) is used for comparison. In addition, we measure [16]:
Equal Error Rate (EER) - the percentage of misclassified frames when the false positive rate (FPR) is equal to the false negative rate (miss rate) i.e. FPR = 1 - true positive rate (TPR). EER is calculated for both pixel and frame level analyses; Rate of Detection (RD) - reports the detection rate at equal error point on processing the anomaly localization component, i.e., 1- EER[16].
A. UCSD dataset – Local Social Network Evaluation We use 200 frames at resolution 158 x 238. 300 features are detected with a minimum distance of 3 to have a complete coverage of the scene. We partitioned the dataset into 4 time windows. Each time window of 50 frames is partitioned into 8 x 8 spatio-temporal cuboids. The dataset contains the biker anomaly. The results are compared against the ground-truth in terms of frame accuracy and pixel accuracy. In addition, the average of both frame accuracy and pixel accuracy, respectively, is computed for each time window. As shown in Table 3, the green cuboid represents false positive abnormal behavior in some of its LSNs. The orange-colored cuboid represents true positive abnormal activity, whereas the abnormal region is enclosed in red border. For instance, the first time window, Ω , starting from frame 1 and ending at frame 50, contains 7 false positive cuboids. The abnormality is due to the existence of tracklets that exhibit rare features relative to the surrounding neighborhood. Cuboid 12 results in abnormal LSN with tracklets exhibiting short magnitude comparable to the dominant longer tracklets in the surrounding. Moving to the next time window Ω that starts at frame 51 and ends at frame 100, two abnormal cuboids out of four are detected correctly. The abnormal LSN components are in cuboid 47 and cuboid 55. However, the abnormal LSN components in cuboids 46 and 54 are wrongly classified as normal due to: (i) in cuboid 46, the incomplete abnormal tracklet(s) exhibit similar features to its surrounding neighbors and thus were assigned to the normal LSN component, (ii) cuboid 54 contains only one tracklet. The tracklets in cuboid 54 in Ω that starts at frame 101 and ends at frame 150, were removed.
2
http://mmlab.science.unitn.it/UCD/
Table 1: Frame average accuracy and pixel average accuracy of LSN algorithm. A dataset is partitioned into 4 time windows, each consist of 50 frames and partitioned into 8 x 8 spatio-temporal cuboids. Green-colored blocks represent false positive abnormal behavior while orange-colored blocks represent true positive abnormal activity. For each time window, the frame accuracy and pixel accuracy of the abnormal local social network components are averaged.
Frame Sequences
8 x 8 spatio‐temporal cuboids
Anomaly Spatio‐ temporal Cuboids
Frame Accuracy
Pixel Accuracy
Average Accuracy
0
0
55
66
69
1 – 50
51 - 100
101 - 150
151-200
54
0
0
47 46
67 0
66 0
Average Accuracy
33.3
33.8
47
80
84
46
79
75
45
76
77
37 36
79 77
83.2 83
Average Accuracy
78.2
80.4
37
79.4
83.2
36
80
83
28
82
89.2
27
76
82
19
76
79
Average Accuracy
78.7
83.3
Moreover, cuboid 46 gained more feature information regarding the abnormal tracklets and thus were distinguished in the neighborhood. The same applies on cuboid 27 in Ω that starts at frame 151 and ends at frame 200. Further analysis from Table 3 shows, as more tracklet information is gained or new behavior is captured by the newly produced tracklets, both average frame accuracy and average pixel accuracy increases with time. Moreover, each time window reflects the changing environment in the scene. As an example,
1 0.8 TPR
0.6 SNM
0.4 0.2 0 0
0.2
0.4
FPR
0.6
0.8
1
Figure 7: Frame-level ROC curves on UCSD Ped1 dataset. Left: Our proposed approach SNM. Right: The state-of-the-art methods from [23].
1
TPR
0.8 0.6 SNM
0.4 0.2 0 0
0.2
0.4
0.6
0.8
1
FPR
Figure 8: Frame-level ROC curve of our proposed approach SNM on UCSD Ped2 dataset.
cuboid 55 in Ω was removed from the abnormal behaviors hence reflecting the ongoing crowd scenario in video. Another example is cuboid 36, which was wrongly classified as normal in window Ω , but later its truly abnormal detection was increased to an average of 78.5 in frame accuracy and an average rate of detection at 83 on the next two windows, Ω and Ω . B. UCSD dataset – Global Social Network Evaluation For performance comparison, we choose six state-of-the-art methods namely: the Mixture of Dynamic Texture (DTM) [16], the Social Force Model (SF) [10], the Mixture of Optical Flow (MPPCA) [45], the Social Force Model with MPPCA (MPPCA+SF ) [16], and the Optical Flow Monitoring (Adam’s) [46]. The quantitative results of these six methods are obtained from [16]. In addition, we also included the Sparse Reconstruction Cost (Sparse) of [23]. The abbreviation of our proposed method is Social Network Model (SNM). Two frame-level ROC curves are produced for UCSD Ped1 and UCSD Ped2 datasets, as shown in Figure 7 and in Figure 8, respectively. As UCSD Ped2 does not provide pixel-level ground- truth, we only present pixel-level ROC curve of UCSD Ped1, as in Figure 9. In addition, Figure 10 shows the Equal Error Rate (EER) of our approach and the state-of-the-art methods.
1
TPR
0.8 0.6 SNM
0.4 0.2 0 0
0.2
0.4 0.6 FPR
0.8
1
Figure 9: Pixel-level ROC curves on UCSD Ped1 dataset. Left: Our proposed approach SNM; Right: The stateof-the-art methods from [23].
We also calculated the Area Under Curve (AUC) values (cf. Table 4), as well as the Rate of Detection (RD) values in Table 5. Missing entries indicate unavailable results. Some example of frames with anomalies detected by the proposed approach and by some state-of-the-art methods are shown in Figure 11. Our frame-level ROC curve on UCSD Ped2 shows higher anomaly detection rate than existing methods, except slightly lower than Sparce [23] on UCSD Ped1. On the other hand, our pixel-level ROC curve on UCSD Ped1, see Table 5, outperforms all state-of-the-art methods. For EER, our frame-level EER (about 20%) for UCSD Ped1 outperforms all methods, but is slightly worse than Sparse method [23] (about 19%), see Error! Reference source not found.. However, for the more precise pixel-level criterion (RD) on UCSD Ped1, see Table 5, our rate of detection is (48.5% > 46% [23]) which significantly outperforms all the state-of-the-art methods. For AUC values on UCSD Ped1 and UCSD Ped2 datasets, we obtained is 86.7% on average that also outperforms all the other methods including [23], where the average AUC is 86.1%, see Table 4. This indicates that the remaining approaches may be enjoying good detection rates in anomaly detection task due to “lucky hits” in terms of frame-level criterion. Some image results are shown in Figure 11 (the abnormal events are labeled by red masks), in which the first column is generated by DTM method [16], the second column is given by MPPCA+SF method [16], and the third and fourth are by our SNM method. MPPCA+SF method completely miss the biker in Figure 11-(b). DTM method does detect nearly all of the abnormal events, but the foreground mask is too large, which is not accurate, as shown in first column of Figure 11. For our method, we detect (third column in Figure 11) and track (fourth column in Figure 11) the abnormal objects robustly with more accurate masks, such as bikers, skaters, small cars, etc. Obviously, the proposed SNM method outperforms the other state-of-the-art methods. Our approach achieves high anomaly localization rate due to the efficiency of hierarchical construction of spatio-temporal cuboids at different spatial and temporal scales.
Equal Error Rate
100% 80% 60% UCSD Ped1
40%
UCSD Ped2 20% 0%
Figure 7: Frame-level Equal Error Rate of UCSD Ped1 and UCSD Ped2 datasets.
Table 2: Quantitative comparison of performance for the abnormality detection algorithms tested. The third and fourth rows show the AUC over the two datasets UCSD Ped1 and UCSD Ped2. The average over the two datasets is shown in the fifth row. Anomaly Detection Experiment: AUC Algorithm
DTM [16]
SF[10]
MPPCA[45]
MPPCA+SF[16]
UCSD Ped1
81.8%
67.5%
59.0%
66.8%
UCSD Ped2
84.8%
62.3%
77.4%
71.0%
Average
83.3%
64.9%
68.2%
68.9%
Adam et al.[46]
Sparce [23]
SNM
86.0%
85.5%
63.4%
86.1%
87.9%
63.4%
86.1%
86.7%
Table 3: The quantitative comparison of the detection rate (RD) at equal error for the anomaly localization task on UCSD Ped1. Our SNM approach achieves the higher detection rate among the state-of-the-art methods.
Localization
Anomaly localization Experiment: Rate of Detection DTM [16]
SF[10]
MPPCA [45]
MPPCA+SF [16]
Adam et al. [46]
Sparse [23]
SNM
45
21
18
28
24
46
48.5
Figure 11: Examples of abnormal detections using (i) the DTM approach [16], (ii) the MPPCA+SF approach [16], (iii) our detection approach and (iv) our tracking approach. For DTM, its abnormal detection foreground mask is too large thus its results are not accurate; and for MPPCA+SF, it inaccurately detects the small car in (a), completely misses the bike in (b), completely misses the skater in (c) and produces spurious abnormality at the near end of the camera in (c). In contrast, our approach using social network model outperform the above approaches with high accuracy detection rate.
Figure 12: The ROC curves of different spatio-temporal scales (2 x 2, 4 x 4 and 8 x 8) on UCSD Ped2 dataset. Table 4: The AUC of different spatio-temporal scales (2 x 2, 4 x 4 and 8 x 8) on UCSD Ped2 dataset. Spatio-temporal partitioning scales Scale
AUC
2x2
82.7%
4x4
86.6%
8x8
87.9%
Figure 8: Anomaly detection in UCD dataset. Frames taken from video sequences representing normal behavior of crowd in the first row. Examples of frames containing anomalies are shown in the second row for the GMM method and in the third row for the proposed method (SNM), respectively.
Spatio-temporal partitioning at different scales: In order to evaluate the impact of different spatiotemporal scales, we experiment with 2 x 2, 4 x 4 and 8 x 8 spatio-temporal scales on UCSD Ped2 dataset. The comparative AUC are shown in Table 6 and the ROC curves in Figure . It is clear that 8 x 8 spatiotemporal partitioning achieves the best result, degrades slightly on using 4 x 4 spatio-temporal partitioning and 2 x 2 spatio-temporal partitioning produces the worst result. The reason is that 2 x 2 provides coarse view of the scene. C. Comparison Form the above experiments we notice: (i) SNM is general - covers local and global anomalous events. On the other hand, Adam’s work [46] detects only local abnormal events using Gaussian of Mixture Models. In addition, SF [10] is a spatial abnormality technique while MPPCA [45] is temporal abnormality technique. (ii) SNM is an unsupervised method, however Sparse [23] requires pre-learnt dictionary and MPPCA+SF [16] approach requires large training dataset. Moreover, the performance of MPPCA+SF [16] degrades if training dataset is small. In addition, optical force method, i.e. social force model uses offline learning. (iii) SNM extends to online event detection via incremental update mechanism. Although Sparse [23] also supports online event detection, however, its training is completely offline. In addition, DTM [16] is an offline approach.
D. UCD Dataset For this dataset, we compared SNM with the Gaussian mixtures model GMM [21] and the crowd segmentation model CSM [4] based on the anomaly detection ground truth. The performance is measured based on the detection accuracy rate. Quantitative comparison of SNM with the ground truth is shown in Table 5. SNM achieves higher anomaly detection accuracy in the four video segments. For accurate anomaly localization, we compared our results to the GMM [21] method. Figure 13 shows the results obtained on the UCD dataset. Frames from video scenes where crowd exhibit normal behaviors are shown in first row. The second row represents results from the GMM [21] and the third row represents results from the SNM. Anomalous behaviors are exhibited as a student running from bottom left to top right and a group of four students are running from left to right. Both events are identified as anomalous since they deviate from the dominant crowd motion. Although GMM[21] correctly identified the anomalous behavior, highlighted by the red dots, the SNM comprehensively highlights the anomalous region of interest. Such dense coverage of the region of interest leads to better tracking and better performance in identifying the anomalous frames, as shown in Table 7. As mentioned before, the compositional information of a video enables the method to handle illumination variations as shown in Figure 14. A person leaving a shop and moving in a direction opposite to the crowd motion (first row in Figure 14) has been detected (second row in Figure 14), tracked and successfully identified as an anomalous motion pattern (last row in Figure 14). Table 5: Comparison of our method with the CSM method based on the UCD groundtruth in anomaly detection.
Segment No. Segment 1 Segment 2 Segment 3 Segment 4
Anomaly Detection Experiment: Percent Accuracy GroundTruth SNM Detection SNM Accuracy Frames Results Frames 2015-2655 2027 - 2645 96.6% 2186-2570 2195 - 2545 91.1% 1826-2286 1856 - 2257 87.2% 2225-2755 2238 - 2713 89.6%
CSM Accuracy[4] 93.7% 88.5% 82.6% 84.9%
E. Spatio-temporal partitioning at different scales Similarly, we tested the influence of different spatio-temporal scales, 2 x 2, 4 x 4 and 8 x 8, on video segment 1 of the UCD dataset. The statistical results of percent accuracy are tabulated in Table 8. Both 4 x 4 and 8 x 8 spatio-temporal partitioning produce similar results. However, 8 x 8 spatio-temporal partitioning consumes more computational time than 4 x 4 spatio-temporal partitioning. For UCD dataset, higher scale give better accuracy than a lower scale since the video is crowded. That is, higher resolutions tend to capture details of a crowded video better than coarser resolutions.
Figure 9: Anomaly object detected successfully under illumination variation. First row represents sample frame of object presence. Second row represents the detection of anomalous object and been tracked in the third row. Table 6: The AUC results of different spatio-temporal scales (2 x 2, 4 x 4 and 8 x 8) on UCD dataset. Spatio-temporal partitioning scales – Segment1 Scale 2x2 4x4 8x8
SNM Detection Results
SNM Accuracy
2035 - 2597
87.8%
2027 - 2645
96.6%
2026- 2643
96.6%
5. Conclusion The proposed social network model, SNM, captures the scene dynamics and crowd interactions spatially and temporally through modeling crowd scenes by a social network. SNM has been shown to outperform the state-of-the-art methods in detecting and localizing anomalies of crowd scenes. Moreover, SNM allows for adaptive partitioning of crowd scenes to capture the details of scene dynamics and thus detect fine anomalous events in the scene as required by an application. Using a set of benchmark crowd analysis video sequences, our experiments show that the detection accuracy of SNM is higher than the other methods.
References: [1] [2] [3] [4] [5] [6] [7] [8] [9]
[10] [11]
[12] [13] [14] [15] [16] [17]
[18] [19] [20] [21] [22] [23] [24]
V. Chandola, A. Banerjee, and V. Kumar, “Anomaly detection: A survey,” ACM Computing Surveys (CSUR), no. 3, September 2009. J. Junior, S. Mussef, C. Jung, “Crowd Analysis using Computer Vision Techniques,” IEEE Signal Processing Magazine, , vol. 27, no. 5, pp. 66–77, 2010. S. Saxena, F. Brémond, M. Thonnat, and R. Ma, “Crowd behavior recognition for video surveillance,” Advanced Concepts for Intelligent Vision Systems, pp. 1–12, 2008. H. Ullah and N. Conci, “Crowd motion segmentation and anomaly detection via multi-label optimization,” ICPR workshop on Pattern Recognition and Crowd Analysis, 2012. A. Basharat, A. Gritai, and M. Shah, “Learning object motion patterns for anomaly detection and improved object detection,” 2008 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8, Jun. 2008. H. Ullah and N. Conci, “Structured Learning for Crowd Motion Segmentation,” in IEEE Conference on Image Processing (ICIP), pp. 824–828, 2013. R. Mazzon, S.F. Tahir, and A. Cavallaro, “Person re-identification in crowd,” Pattern Recognition Letters, vol. 33, no. 14, pp. 1828–1837, Oct. 2012. W. Ge and R.T. Collins, “Marked point processes for crowd counting,” 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 2913–2920, Jun. 2009. Z. Wang, H. Liu, Y. Qian, and T. Xu, “Crowd Density Estimation Based on Local Binary Pattern CoOccurrence Matrix,” 2012 IEEE International Conference on Multimedia and Expo Workshops, pp. 372–377, Jul. 2012. R. Mehran, A. Oyama, and M. Shah, “Abnormal crowd behavior detection using social force model,” IEEE Conference on Computer Vision and Pattern Recognition (CVPR), no. 1, pp. 935-942, 2009. R. Raghavendra, A.D. Bue and M. Cristani, “Optimizing interaction force for global anomaly detection in crowded scenes,” 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops), no. 1, pp. 136-143, 2011. S. Ali and M. Shah, “Floor fields for tracking in high density crowd scenes,” Computer Vision–ECCV, pp. 1– 14, 2008. S. Ali and M. Shah, “A lagrangian particle dynamics approach for crowd flow segmentation and stability analysis,” IEEE Conference on Computer Vision and Pattern Recognition, pp. 1-7, 2007. J. Feng, C. Zhang, and P. Hao, “Online Learning with Self-Organizing Maps for Anomaly Detection in Crowd Scenes,” 2010 20th International Conference on Pattern Recognition, pp. 3599–3602, Aug. 2010. F. Jiang, Y. Wu and A.K. Katsaggelos, “Detecting contextual anomalies of crowd motion in surveillance video,” 2009 16th IEEE International Conference on Image Processing (ICIP), pp. 1117-1120, 2009. V. Mahadevan, W. Li and V. Bhalodia, “Anomaly detection in crowded scenes,” 2010 IEEE Conference on Computer Vision and Pattern Recogniton (CVPR), pp. 1975-1981,2010. V. Reddy, C. Sanderson and BC. Lovell, “Improved anomaly detection in crowded scenes via cell-based analysis of foreground speed, size and texture,” 2011 IEEE Computer Society Conference on Computer Vision and Pattern Recogniton Workshops (CVPRW),pp. 55-61, 2011. M. J. V. Leach, Ed.P. Sparks and N.M. Robertson, “Contextual anomaly detection in crowded surveillance scenes,” Pattern Recognition Letters, vol. 44, pp. 71–79, Jul. 2013. V. Saligrama and Z. Chen, “Video Anomaly Detection Based on Local Statistical Aggregates,” 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2112-2119, 2014. H. Ullah, M. Ullah, and N. Conci, “Real-time anomaly detection in dense crowded scenes,” SPIE-Video Surveillance and Transportation Imaging Applications, vol. 9026, pp. 902608-902608 Mar. 2014. H. Ullah, L. Tenuti, and N. Conci, “Gaussian mixtures for anomaly detection in crowded scenes,” IS&T/SPIE Electronic Imaging, pp. 866303–866303, Mar. 2013. X. Cui, Q. Liu, M. Gao, and D.N. Metaxas, “Abnormal detection using interaction energy potentials,” 2011 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3161–3167, Jun. 2011. Y. Cong, J. Yuan, and J. Liu, “Sparse reconstruction cost for abnormal event detection,” 2011 IEEE Conference on Computer Vision and Pattern Recognition, pp. 3449–3456, Jun. 2011. O. Ozturk, T. Yamasaki, and K. Aizawa, “Detecting Dominant Motion Flows in Unstructured/Structured Crowd Scenes,” 2010 20th International Conference on Pattern Recognition (ICPR), pp. 3533–3536, Aug. 2010.
[25] [26] [27]
[28] [29]
[30] [31]
[32] [33] [34] [35] [36] [37]
[38] [39] [40] [41]
[42] [43] [44] [45]
[46]
[47]
[48]
D. Neelima and K.L. Rao, “A Moving Object Tracking and Velocity Determination,” International Journal of Advanced Engineering Sciences and Technologies (IJAEST), vol. 11, no. 1, pp. 96–100, 2011. W. Chongjing, Z. Xu, Z. Yi, and L. Yuncai, “Analyzing motion patterns in crowded scenes via automatic tracklets clustering,” Communications, China 10, no. 4, April, pp. 144–154, 2013. C. Stauffer and W.E.L. Grimson, “Adaptive background mixture models for real-time tracking,” Proceedings 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat No PR00149), vol. 2, pp. 246–252, 1999. J.Y. Bouguet, “Pyramidal implementation of the affine lucas kanade feature tracker description of the algorithm,” Intel Corporation, vol. 1, no. 2, pp. 1–9, 2001. D.T. Schmitt, S.H. Kurkowski and M.J. Mendenhall, “Building Social Networks in Persistent Video Surveillance,” 2009 IEEE International Conference on Intelligence and Security Informatics, pp. 217–219, 2009. J. Karamon, Y. Matsuo, H.Yamamoto and M.Ishizuka “Generating social network features for link-based classification,” Knowledge Discovery in Databases: PKDD 200,7 pp. 127-139, 2007. C. Li and S. Lin, “Social flocks: a crowd simulation framework for social network generation, community detection, and collective behavior modeling,” Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 765–768, 2011. T. Yu and S. Lim, K. Patwardhan and N.Krahnstoever,“Monitoring, recognizing and discovering social networks,” 2009 IEEE Conference on Computer Vision and Pattern Recogniton (CVPR), pp. 1462-1469, 2009. U. von Luxburg, “A tutorial on spectral clustering,” Statistics and Computing, vol. 17, no. 4, pp. 395–416, Aug. 2007. G. Karypis, E. Han, and V. Kumar, “Chameleon: Hierarchical clustering using dynamic modeling,” Computer, no.8, pp. 68-75, 1999. R. X. and I. D. C. WUNSCH, “Clustering,” IEEE Press Ser. Comput. Intell, 2009. M. Zeppelzauer, M. Zaharieva, D. Mitrovic and C. Breiteneder, “A novel trajectory clustering approach for motion segmentation,” Advances in Multimedia Modeling, pp. 433-443, 2010. D. Sugimura, K.M. Kitani, T. Okabe, Y. Sato and A. Sugimoto, “Using individuality to track individuals: Clustering individual trajectories in crowds using local appearance and frequency trait,” 2009 IEEE 12th International Conference on Computer Vision, pp. 1467–1474, Sep. 2009. A. Gaidon, Z. Harchaoui, and C. Schmid, “Recognizing activities with cluster-trees of tracklets,” BMVC, 2012. C. Tomasi and T. Kanade, “Detection and tracking of point features,” International Journal of Computer Vision, no. 7597, 1991. H. Dee and A. Caplier, “Crowd behaviour analysis using histograms of motion direction,” in Proceedings of the 17th IEEE International Conference on Image Processing, pp. 1545–1548, 2010. S. Wu, B. Moore, and M. Shah, “Chaotic invariants of lagrangian particle trajectories for anomaly detection in crowded scenes,” IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2054–2060, 2010. H. Dee and A. Caplier, “Crowd behaviour analysis using histograms of motion direction,” in Proceedings of the 17th IEEE International Conference on Image Processing, pp. 1545–1548, 2010. F. Santoro, S. Pedro, Z. Tan, and T. Moeslund, “Crowd analysis by using optical flow and density based clustering,” 18th European Signal Processing Conference (EUSIPCO), pp. 269–273, 2010. L.C. Freeman, “Centrality in social networks conceptual clarification,” Social networks, vol. 1, no. 1968, pp. 215–239, 1979. J. Kim and K. Grauman, “Observe locally, infer globally: a space-time MRF for detecting abnormal activities with incremental updates,” IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2921– 2928, June 2009. A. Adam, E. Rivlin, I. Shimshoni, and D. Reinitz, “Robust real-time unusual event detection using multiple fixed-location monitors.,” IEEE transactions on pattern analysis and machine intelligence, vol. 30, no. 3, pp. 555–60, Mar. 2008. Thida, M.; How-Lung Eng; Remagnino, P., "Laplacian Eigenmap With Temporal Constraints for Local Abnormality Detection in Crowded Scenes," Cybernetics, IEEE Transactions on , vol.43, no.6, pp.2147,2156, Dec. 2013 doi: 10.1109/TCYB.2013.2242059 Yuan Yuan; Jianwu Fang; Qi Wang, "Online Anomaly Detection in Crowd Scenes via Structure Analysis," Cybernetics, IEEE Transactions on , vol.45, no.3, pp.562,575, March 2015 doi: 10.1109/TCYB.2014.2330853
[49] [50] [51]
D. Helbing and P. Molnar., "Social force model for pedestrian dynamics", Physical Review E, 51:4282, 1995. D. Wyatt, “Collective Modeling of Human Social Behavior,” AAAI Spring Symposium: Human Behavior Modeling, 2009. Rima Chaker, Imran N. Junejo and Zaher Al-Aghbari “Crowd Modeling Using Social Networks”, IEEE International Conference on Image Processing (ICIP), 2015.