An effective implementation of Gaussian of Gaussian descriptor for

0 downloads 0 Views 279KB Size Report
time camera surveillance system. In addition, to improve the effectiveness of GOG descriptor, we select the optimal param- eters of GOG descriptor and apply ...
An effective implementation of Gaussian of Gaussian descriptor for person re-identification Thuy-Binh Nguyen

∗† , ∗

Duc-Long Tran ∗ , Thi-Lan Le ∗ , Thi Thanh Thuy Pham

∗‡ ,

Huong-Giang Doan

∗§

Computer Vision Department, MICA International Research Institute, Hanoi University of Science and Technology, Vietnam Email: [email protected] † Faculty of Electrical and Electronics Engineering, University of Transport and Communications, Hanoi, Vietnam ‡ Faculty of Security and Information Technology, Academy of People Security § Faculty of Control and Automation, Electrical Power University, Hanoi, Vietnam Abstract—Person re-identification (ReID), a critical task in surveillance systems, has obtained impressive advances in recent years. However, most current works focus on improving the person re-identification accuracy. In practical terms, the direct use of these works seems difficult, even infeasible. Among the features proposed for person representation in person ReID, Gaussian of Gaussian (GOG) has been proved to be robust. Towards applying this feature for practical usage, in this work, we simultaneously propose two improvements. First, we reimplement and perform intensive experiments to select the optimal parameters of GOG during feature extraction. Second, we propose and apply preprocessing techniques on person images. The experimental results show that the proposed approach allows to extract GOG 2 times faster than the available source code and achieve remarkably high accuracy for person ReID. The obtained accuracies at rank-1 on VIPeR dataset are 51.74% (with background) and 57.25% (without background). The implementations and evaluation datasets used in this paper are made publicly available.

I. I NTRODUCTION In recent years, person ReID problem has become more and more interesting research topic in computer vision due to its wide range of applications such as robotics, industrial and security surveillance, etc [1]. Re-identification is the process of identifying a person viewed at different field of views (FOVs) of different cameras. This means matching the image of each person captured from one FOV to others is done in ReID process. In most of the proposed person ReID experiments, the matching is done between an image or a group of images of an unknown person in a probe set with the pedestrian in a gallery set. Depending on the number of images in probe and gallery sets, person ReID can be classified into two main approaches: single-shot and multiple-shot. In single-shot approach, each person has only one image on both probe and gallery sets. While, in multiple-shot approach, each person has at least one image in the probe set and a group of images in the gallery set. In general, a person ReID system includes two critical stages of feature extraction and person matching. In the first stage, a robust descriptor of person appearance is built. The second phase is followed by comparing feature vectors built in the first phase. This comparison is done by a similarity function based on several metric distances, such as Euclidean distance (L2),

Cosine distance or projected into a sub-space to distinguish more easily. Person re-identification (ReID) has obtained impressive advances in recent years [1]. However, most current works focus on improving the person re-identification accuracy and do not pay attention on the computational time. Meanwhile, this parameter is one of the most important element in the practical applications of person ReID, such as person tracking, localization, etc. From this point of view, in this work, we propose to re-implement a robust descriptor of Gaussian of Gaussian (GOG) [2] in C++. This will make GOG feature prospective for practical application of person ReID in realtime camera surveillance system. In addition, to improve the effectiveness of GOG descriptor, we select the optimal parameters of GOG descriptor and apply pre-processing techniques on person images. The experimental results show that the proposed approach allows to extract GOG 2 times faster than the available source code. We evaluate the proposed method in the single-shot scenario with VIPeR dataset and achieve remarkably high accuracy for person ReID. The obtained accuracies at rank-1 on VIPeR dataset are 51.74% (with background) and 57.25% (without background). The rest of this paper is organized as follows. Section II discusses about some related works to our approach. Overall proposed framework is indicated in Section III. Section IV provides obtained results by conducting several extensive experiments. Finally, conclusion and future work are presented in the last section. II. R ELATED WORK Although many enhancements on person ReID are proposed, it is still a challenging problem in computer vision. Different illumination conditions, background cluttering, occlusion caused by crowded scene, pose changes from a different point of view are main challenges for person ReID in camera network. In order to overcome these, some improvements are mainly applied on two critical phases of feature extraction and metric learning. Chang et al. [3] extracted features and built feature vectors on different parts of a foreground image. These vectors are then concatenated for representing a person image. In [4],

Zhao et al. solved person ReID by finding saliency feature within an image. This is an important signature for matching person. Concerning with visual colors, Yang et al. [5] provided a novel SCNCD (salient color names based color descriptor) for describing a person image. In this method, higher probability will be assigned to the color name which is nearer to the visual color. In order to overcome the challenge caused by different viewpoints, Liao et al. [6] focused on horizontal occurrence of local features and maximize this to build a robust descriptor named LOMO. Matsukawa et al. [2] claimed that mean and covariance are the most important cues for representing a person appearance. They proposed Gaussian of Gaussian (GOG) descriptor which inherits the advantage of covariance-based descriptor. Recent years have witnessed the impressive results of Convolutional Neural Networks (CNNs) in pattern recognition. For person ReID, CNN and its variants are deployed to achieve higher performance. In [7], the authors combined hand-crafted and deep learning features to build an effective descriptor called Feature Fusion Network (FNN). Cheng et al. [8] introduced a multi-channel CNN model (MCCNN) that learns both global full-body and local part features. The features are integrated to generate a final signature for each person image. Ding et al. [9] provided a feature which is driven by scalable distance. This proposed learning framework aim at maximizing the relative distances between matched and the mis-matched pairs of person images. This brings higher performance of person ReID. Besides building effective descriptors, some works have concentrated on learning distance metrics to minimize the distance between intra-class objects and maximize the distance between extra-class objects. Some prominent proposals are Keep It Simple and Straightforward Metric (KISSME) [10], Bayesian face [11], XQDA [6], RankSVM [12]. XQDA can be considered as an extension of Bayes face [11] and KISSME [10], in which a discriminative subspace is further learned together with a metric. As presented above, most of studies focus on improving the matching rate and do not mention to computation time or the solution to make person ReID become more practical. There are a few works interested in practical approach on person ReID [13], [14]. However, the authors still concentrate on achieve a higher performance rather than to reduce processing time. This is the motivation for us to propose a solution to make person ReID more realistic. III. P ROPOSED METHOD A. Overall framework Figure 1 shows the proposed framework including three main steps: pre-processing, feature extraction, and metric learning. Although GOG is effective for person representation, it is still difficult to overcome the challenge of illumination variation. In order to reduce this affect, image preprocessing will be applied on the person images. The multi-scale Retinex algorithm [15] is used for this to generate color images that are consistent to human perception. GOG feature is then extracted

from the preprocessed person images. Finally, XQDA is used to learn the distance metric between two person images. Gallery images

Preprocessing

Feature extraction

Metric learning ID person

Probe image

Preprocessing

Feature extraction

Fig. 1. The proposed framework towards to practical applications of person ReID.

B. Image preprocessing In order to reduce the affect of illumination variation on person ReID process in camera network, we propose to integrate a step of image preprocessing in the pipeline of the person ReID system. Multi-Scale Retinex with Color Restoration (MSRCR) algorithm [15] is chosen for this step because it allows to generate color images that are consistent to illumination changes. The multi-scale Retinex algorithm is integrator of different Single-Scale Retinex (SSR) algorithms [16], such as small-scale Retinex for dynamic range compression and the large-scale Retinex for tonal rendition with a universally applied color restoration. First, the original image is processed in the logarithmic space to focus on the relative details. Then, the processed image is smoothed by employing a 2D convolution operation with Gaussian surround function. Finally, an enhanced image is achieved by removing smooth parts. By this way, this algorithm deals with both problems of color constancy and dynamic range compression. This brings images captured by cameras approximately consistent with human visual perception. In this paper, we use three scales σ for Retinex algorithm that are 15, 80, 250. Figure 2 shows example pairs of images from VIPeR dataset including the original images and processed ones. C. GOG extraction GOG (Gaussian Of Gaussian) [2] descriptor is based on hierarchical Gaussian distribution of pixel-level features. The advantage of this feature is to keep mean and covariance information useful for representing a pedestrian image. The pipeline for GOG extraction is illustrated in Fig. 3. It shows the nature of twice-repeated Gaussian distribution, once at patch-level and the other at region-level. As depicted in Fig. 3a, a person image is divided into N overlapping regions that corresponding to N horizontal stripes. Within each horizontal stripe, dense patches (k×k) are extracted with p pixel intervals and these patches are represented by a Gaussian function of

(a)

(b)

Fig. 2. Example pairs of images from VIPeR dataset, images in the same column represent the same person (a) The original images; (b) Processed images by Multi-Scale Retinex.

pixel feature vectors. The details of this process are described as followings. 1 2

Pixel

Region Patch N

(a)

Flatten and

Pixel features Patch Region concatenate (location, Gaussian Riemannian Gaussian Gaussian gradient, color) manifold

distribution

Pixel-level

Patch-level

Region-level

(b) Fig. 3. (a) A person image is divided into patches and regions; (b) Pipeline for GOG feature extraction.

1) Pixel feature extraction: At pixel level, a d-dimension feature vector is created by extracting characteristic of each pixel. This feature vector contains any information about a given pixel, such as color, intensity, texture, etc. In [2], the author extracted an 8-dimensional pixel feature vector (d = 8). This feature vector contains the information about location in vertical direction, gradient, and color of the pixel at different color spaces. The descriptor as depicted as following equation: T

fi = [y, M0o , M90o , M180o , M270o , R, G, B] ,

(1)

where y is the pixel location in the vertical direction. Obviously, parts of human body are well aligned in vertical direction, hence, y-coordinate of a given pixel is chosen to provide the location information. Mθ∈{0o , 90o , 180o , 270o } are the magnitudes of pixel intensity gradient along four orientations. Gradient orientation is calculated based on the intensity image: O = arctan(Iy /Ix ), where Ix and Iy are the derivatives along horizontal and vertical directions. These orientations are

quantized 4 bins, Oθ∈{0o , 90o , 180o , 270o } . (R, G, B) are color channel values. In this paper, we will evaluate the effect of the number of gradient bins on recognition rates of person ReID. In addition, different color spaces are also employed, such as RGB, Lab, HSV and normalized color space of nRGB with nR = R/R + G + B. Nevertheless, because of redundancy information, only {nR, nG} are used in this color space. 2) Patch level feature extraction: The feature vectors created from pixel-level feature extraction are modeled as Gaussian patch by applying a Gaussian distribution. A Gaussian  patch is denoted as N (f ; μs s ):     T −1 (f − μ ) exp − 12 (f − μs )  s s N f ; μs = , (2) d/2  (2π) | s | s  where | s | is the determinant of the  covariance matrix,  μs is the mean vector: μs = n1s i∈Ωs fi and  s is the covariance matrix of the sampled patch s: s =  T 1 (f − μ )(f − μ ) , where Ω is the area of a i s i s s i∈Ωs ns −1 given patch s and ns is the number of pixels in Ωs . The Gaussian patches are then flatten in Euclidean space by projecting them into a tangent space with Riemannian metric. This paves the way for calculating Euclidean distance which can not be directly calculated in the probability distribution space of Riemannian manifold. In [2], the author proposed to utilize Symmetric Positive Definite (SPD) matrix and log Euclidean distance for mapping manifold space to a Euclidean tangent space. A Gaussian patch is embedded in the SPD matrix according to the way introduced in [17], as depicted in following equation: 1  − d+1  

    + μs μTs μs   s ∼ Ps =   N f ; μs , μs 1   s s (3) Due to the symmetric nature of a SPD matrix, the upper triangular part of the mapped matrix is sufficient for representing a Gaussian patch and these values are aggregate to form a m = d2 + 3d 2 + 1 dimensional vector, called gs . In addition, in order to reduce the negative influence of background, Matsukawa et al. [2] for  introduced a weight  2 2 each patch defined as: ωs = exp −(xs − xc ) /2σ , where xc = W /2, σ = W /4 , and W is the width of an image person. 3) Region level feature extraction: After applying Gaussian distribution at patch level, it is applied one more time at region level. The Gaussian regions are generated by modeling Gaussian patches via a Gaussian distribution with mean vector and covariance matrix as following:  1 ωs g s (4) μG =  s∈G ωs s∈G

G 

=

1 s∈G ωs

 s∈G

T ωs g s − μ G g s − μ G

(5)

D. Person matching XQDA [6] considers multi-class classification as a binary one. It pays attention to cross-view metric learning between intra-personal ΩI and extra-personal ΩE variations. The task of XQDA is to learn both discriminative subspace with crossview data and distance function. Equation (6) presents the distance function in the subspace W : T

dW (x, z) = (x − z) M (W) (x − z) ,

(6)

where dw (x, z) is the distance between two cross-view data x and z, M (W) is defined as a kernel matrix: M (W) = W WT ΣI W − WT ΣE W WT

(7)

with ΣI and ΣE are the covariance matrices of ΩI and ΩE , respectively. IV. E XPERIMENTAL RESULTS A. Dataset and evaluation metric In order to compare to the original work [2], we conducted extensive experiments on VIPeR (Viewpoint Invariant Pedestrian Recognition) dataset [18]. This is one of the most challenging datasets with strong variation in pose, illumination, viewpoint and occlusion. The dataset contains 1,264 images of 632 persons, with image resolution is 128×48. Each person has a pair of images which are captured by two different nonoverlapping cameras. Among 632 persons, 316 persons are used for training XQDA while other 316 persons are used for testing. Besides the experiments on the original images of VIPeR, we used VIPER Foreground dataset [19] which contains only foreground images of person for further testings. Concerning evaluation metric, we employ Cumulative Matching Characteristic curve (CMC). In CMC curve, the horizontal axis shows ranks while the vertical axis describes the rate of the true matching corresponding to each rank. The value of CMC curve at rank k presents the rate of true matching in the first k ranks. The higher CMC curve is, the better person re-identification method is.

1) Comparison the performance of two implementations: We compare two implementations in terms of ReID accuracy and computational time. Concerning ReID accuracy, Figure 4 shows the CMC curves obtained by two implementations in Lab color space. Obviously, the two CMC curves are very close, approximately overlapped. This indicates that our implementation allows to produce similar results as the source code provided by the authors in [2]. Concerning computational time, for one image with the resolution 128 × 48, our implementation takes 0.109 s (∼ 10fps) while that of the authors in [2] needs 0.286 s on Computer Intel(R) Core(TM) i5-6200U 2.3GHz, 8GB RAM-DDR3 1600MHz. This means that our implementation allows to extract GOG 2 times faster than the implementation of [2]. The obtained frame rate of ∼10fps can satisfy the real-time requirement of several surveillance applications. Figure 5 shows computational time of each step in GOG feature extraction for one person image. The VIPER Foreground dataset and the implementation of GOG in C++ are available at http://mica.edu.vn/perso/Le-ThiLan/images/NICS2018.html. 100 90

Matching rate (%)

They are then also flatten by embedding them into a SPD matrix which is mapped to tangent space in order to extract the upper triangular for creating a m2 + 3m 2 + 1 dimensional feature vector, denoted as z. Finally, these vectors are concatenated to form the signature of each person image.

80 70 60 50 40

42.28% VIPeR_RGB_original code [6] 41.90% VIPeR_RGB_re-implement code

30 5

10

Rank

15

20

Fig. 4. ReID accuracy of the source code provided in [2] and that of our implementation.

region combine 0.001 region flatten 0.018

pixel feature 0.016

B. Experimental results It is worth to note that the code for extracting GOG feature is publicly provided by the authors in [2]. However, the authors implemented it in Matlab. In order to apply GOG feature in practice, we re-implement GOG extraction as described in Section III in C++. In our implementation, we employ dlib, opencv libraries.

patch flatten 0.074

Fig. 5. Computation time (in s) for each step in extracting GOG feature on a person image with our implementation.

60

and fusing all color spaces. It is worth to note that the best result when using the provided source code of the authors [2] with the default parameters is 49.70%. This shows that by choosing the optimal parameters, the accuracy at rank-1 can increase 1.34%. In comparison with the results obtained from the original images of VIPeR in Fig .8, Fig .9 shows the better ones on the preprocessed images by Multi-Scale Retinex algorithm. This demonstrates the effectiveness of the image preprocessing step in our proposed framework for person ReID.

Matching rates at rank-1 (%)

50

40

30

20

10

0

100

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49

The number of regions

90

Fig. 6. The matching rates at rank-1 with different number of regions (N).

Matching rate (%)

80

Matching rate (%)

100

80

bins=4 bins=5 bins=6 bins=7 bins=8 bins=9 bins=10

60

70 60

43.13%_org_15_5_RGB 44.30%_org_15_5_Lab 39.87%_org_15_5_HSV 39.56%_org_15_5_nRnG 51.04%_org_15_5_Fusion

50 40 30

40

5

10

15

20

Rank 20

Fig. 8. CMC curves on VIPeR dataset when extracting GOG features with the optimal parameters.

0 1

5

10

Rank

15

20

100

Fig. 7. The matching rates at ranks 1, 5, 10, 15 and 20 with different number of gradient bins.

90

2) Analyze the effect of GOG parameters: There are several parameters used in GOG feature extraction. In order to evaluate the effect of these parameters on person ReID, we chose two important parameters that are the number of regions or stripes (N) and the number of gradient bins. Figure 6 shows the matching rate at rank-1 when the number of regions varies from 1 to 49. In [2] the authors fixed the number of region is 7. It can be observed that using 13, 15 or 17 regions allows to obtain the best ReID results in rank-1. However, the results from 15 region outperform the other ones at other ranks (e.g., rank 5, 10, 15, 20). Figure 7 indicates the variation of matching rates at important ranks of 1, 5, 10, 15, 20 when the number of gradient bins is changed. As seen in this Figure, the best performance is achieved when the number of gradient bins is 5. Based on the above analysis, in our experiments, we choose the number of regions is 15 and the number of gradient bins is 5. Others parameters are kept as in the experiments of [2]. 3) Analyze the performance of the proposed approach: Figure 8 presents CMC curves which are evaluated on different color spaces (RGB/Lab/HSV/nRnG) and the fusion of these color spaces, with the optimal parameters chosen for GOG. The best result (51.04%) is obtained when using 15 regions

Matching rate (%)

80 70 60

44.59%_retinex_15_5_RGB 45.70%_retinex_15_5_Lab 40.41%_retinex_15_5_HSV 39.59%_retinex_15_5_nRnG 51.74%_retinex_15_5_Fusion

50 40 30 5

10

15

20

Rank

Fig. 9. The obtained results when applying Retinex algorithm on VIPeR dataset.

To show the performance of GOG for person Re-ID when the person blobs are well determined, we evaluate the proposed method on VIPER Foreground dataset. The obtained results are shown in Fig. 10. The matching rate at rank-1 is 57.25%. This is state-of-the-art result for this dataset. It is noted that the best result on this dataset by the saliency method [19] is 27.18%. The obtained results show the effectiveness of GOG for person representation. Table I shows the comparative results of the proposed method with the state-of-the-art methods on VIPeR dataset. It can be observed that the performance of GOG feature

Matching rate (%)

100

ACKNOWLEDGEMENT ( S )

90

This research is funded by Vietnam National Foundation for Science and Technology Development (NAFOSTED) under grant number 102.01-2017.12.

80 70

R EFERENCES

60

49.59%_foreground_15_5_RGB 50.63%_foreground_15_5_Lab 49.68%_foreground_15_5_HSV 44.02%_foreground_15_5_nRnG 57.25%_foreground_15_5_Fusion

50 40 30 5

10

15

20

Rank

Fig. 10. Obtained results on VIPeR Foreground dataset.

outperforms all state-of-the-art hand-designed features. It gains 51.0% for matching rate at rank-1. This is better than the best one of LOMO [6] from other hand-designed features by 11.0%. Moreover, GOG feature provides impressive results which are even competitive to deep-learned ones. The recognition rates when applying Retinex algorithm are approximately the ones from FNN method [7], in which the authors combined both hand-designed and deep-learned features with much more time spent for feature extraction processing. TABLE I C OMPARISON OF STATE - OF - THE - ART RESULTS ON VIP E R DATASET. T HE TWO BEST SCORES ARE SHOWN IN BOLD .

Hand-designed

Deep-learning Proposed

Methods IRWPS [3] SalMatch [4] SCNCD [5] LOMO+XQDA [6] FNN [7] MCCNN [8] RDC [9] GOGOptimalParameters GOGRetinex GOGForeground

R=1 23.2 30.2 37.8 40.0 51.1 47.8 40.5 51.0 51.7 57.2

R=5 45.3 52.0 68.5 81.0 74.7 60.8 81.1 80.8 84.0

R=10 58.3 65.0 81.2 80.5 91.4 84.8 70.4 89.1 89.8 91.3

R=20 68.7 90.4 91.1 96.9 91.1 84.4 94.6 95.3 96.5

V. C ONCLUSIONS AND F UTURE WORK In this paper, we propose a solution to make person ReID become more practical. Firstly, we re-implement the source code of extracting GOG feature in C++ and ensure the obtained results are the same to the original ones. By reimplementation source code in C++, feature extraction step spends only 0.109 s for each person image (∼10fps). Secondly, two parameters in GOG feature are evaluated and optimal values are chosen to achieve the better performance. Finally, the proposed pre-processing step allows to improve the accuracy of person ReID. Obviously, the proposed framework deals with both main issues of person ReID that are computation time and accuracy. However, in this paper, we only apply the proposed framework for single-shot person ReID. In the future work, we will evaluate the proposed method for multi-shot case and develop a fully automatic person ReID system.

[1] S. Karanam, M. Gou, Z. Wu, A. Rates-Borras, O. Camps, and R. J. Radke, “A systematic evaluation and benchmark for person reidentification: Features, metrics, and datasets,” IEEE Transactions on Pattern Analysis & Machine Intelligence, no. 1, pp. 1–1, 2018. [2] T. Matsukawa, T. Okabe, E. Suzuki, and Y. Sato, “Hierarchical gaussian descriptor for person re-identification,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 1363–1372. [3] Y.-C. Chang, C.-K. Chiang, and S.-H. Lai, “Single-shot person reidentification based on improved random-walk pedestrian segmentation,” in Intelligent Signal Processing and Communications Systems (ISPACS), 2012 International Symposium on. IEEE, 2012, pp. 1–6. [4] R. Zhao, W. Ouyang, and X. Wang, “Person re-identification by salience matching,” in Proceedings of the IEEE International Conference on Computer Vision, 2013, pp. 2528–2535. [5] Y. Yang, J. Yang, J. Yan, S. Liao, D. Yi, and S. Z. Li, “Salient color names for person re-identification,” in European conference on computer vision. Springer, 2014, pp. 536–551. [6] S. Liao, Y. Hu, X. Zhu, and S. Z. Li, “Person re-identification by local maximal occurrence representation and metric learning,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp. 2197–2206. [7] S. Wu, Y.-C. Chen, X. Li, A.-C. Wu, J.-J. You, and W.-S. Zheng, “An enhanced deep feature representation for person re-identification,” in Applications of Computer Vision (WACV), 2016 IEEE Winter Conference on. IEEE, 2016, pp. 1–8. [8] D. Cheng, Y. Gong, S. Zhou, J. Wang, and N. Zheng, “Person reidentification by multi-channel parts-based cnn with improved triplet loss function,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 1335–1344. [9] S. Ding, L. Lin, G. Wang, and H. Chao, “Deep feature learning with relative distance comparison for person re-identification,” Pattern Recognition, vol. 48, no. 10, pp. 2993–3003, 2015. [10] M. Koestinger, M. Hirzer, P. Wohlhart, P. M. Roth, and H. Bischof, “Large scale metric learning from equivalence constraints,” in Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on. IEEE, 2012, pp. 2288–2295. [11] B. Moghaddam, T. Jebara, and A. Pentland, “Bayesian face recognition,” Pattern Recognition, vol. 33, no. 11, pp. 1771–1782, 2000. [12] B. J. Prosser, W.-S. Zheng, S. Gong, T. Xiang, and Q. Mary, “Person re-identification by support vector ranking.” in BMVC, vol. 2, no. 5, 2010, p. 6. [13] X. Zhu, B. Wu, D. Huang, and W.-S. Zheng, “Fast openworld person re-identification,” IEEE Transactions on Image Processing, 2017. [14] R. Kawai, Y. Makihara, C. Hua, H. Iwama, and Y. Yagi, “Person re-identification using view-dependent score-level fusion of gait and color features,” in Pattern Recognition (ICPR), 2012 21st International Conference on. IEEE, 2012, pp. 2694–2697. [15] D. J. Jobson, Z.-u. Rahman, and G. A. Woodell, “A multiscale retinex for bridging the gap between color images and the human observation of scenes,” IEEE Transactions on Image processing, vol. 6, no. 7, pp. 965–976, 1997. [16] ——, “Properties and performance of a center/surround retinex,” IEEE transactions on image processing, vol. 6, no. 3, pp. 451–462, 1997. [17] P. Li, Q. Wang, and L. Zhang, “A novel earth mover’s distance methodology for image matching with gaussian mixture models,” in Proceedings of the IEEE International Conference on Computer Vision, 2013, pp. 1689–1696. [18] D. Gray and H. Tao, “Viewpoint invariant pedestrian recognition with an ensemble of localized features,” in European conference on computer vision. Springer, 2008, pp. 262–275. [19] T. B. Nguyen, V. P. Pham, T.-L. Le, and C. V. Le, “Background removal for improving saliency-based person re-identification,” in Knowledge and Systems Engineering (KSE), 2016 Eighth International Conference on. IEEE, 2016, pp. 339–344.

Suggest Documents