Multimedia Tools and Applications https://doi.org/10.1007/s11042-018-6654-5
Re-ranking pedestrian re-identification with multiple Metrics Shuze Geng 1 & Ming Yu 1,2 & Yi Liu 2 & Yang Yu 2 & Jian Bai 2 Received: 11 January 2018 / Revised: 27 August 2018 / Accepted: 5 September 2018 # Springer Science+Business Media, LLC, part of Springer Nature 2018
Abstract Pedestrian re-identification (re-ID) is a video surveillance technology for specific pedestrians in non-overlapping multi-camera scenes. However, due to the influence of dramatic changes in perspectives and pedestrian occasions, it is still a huge challenge to find a stable, reliable algorithm in high accuracy rate. In this paper, to increase the robustness and performance of re-ID, we proposed a re-ID method by re-ranking the refined re-ID results (i.e. initial lists) gotten from the kernel-Local Fisher Discriminant Analysis (kLFDA) and Marginal Fisher Analysis (MFA) metrics, which can improve the probability of the correct target on the initial result lists and also enhance the robustness. During the process of re-ranking, in order to distinguish pedestrians in high similarity, a rigorous distance constraint model named Perspective Distance Model (PDM) is designed to further reduce the intra-class variations and increase the distance of inter-class variations. By using the PDM, the concise results gotten from the kLFDA and MFA metrics are re-ranked in order to further recognize different individuals in high similarity and improve the re-ID rate. Experimental results on seven challenging re-ID datasets (VIPeR, CUHK01, Prid2011, iLIDS, CUHK03, Market-1501and DukeReID) show that the performance of proposed method is high and effective. Keywords Re-ranking . Pedestrian re-identification . Perspective distance model . Kernel-local fisher discriminant analysis . Marginal fisher analysis
1 Introduction Pedestrian re-identification (re-ID), which refers to seek the same individual across nonoverlapping camera view, has attracted extensive attention in recent years because of wide (Our code is available: https://github.com/gengshuze/RPDM_RE-ID.git).
* Ming Yu
[email protected]
1
School of Electronic Information Engineering, Hebei University of Technology, Tianjin 300401, China
2
School of Computer Science and Engineering, Hebei University of Technology, Tianjin 300401, China
Multimedia Tools and Applications
applications in criminal investigation and video surveillance. However, due to the influence of image background change, perspectives, variation of light and pedestrian poses, the rate of reID is still very low. Figure 1 shows some typical samples of pedestrian images taken from different perspectives. We can see that the same pedestrian in different perspectives has different poses, which brings great challenges to pedestrian re-ID. The existing re-ID methods can be divided into three categories [57], feature extraction methods [2, 3, 11, 15, 17, 18, 20–24, 31, 34, 39, 41, 46, 54, 63, 64], metric learning methods [6, 14, 16, 26–29, 32, 47–49, 52, 55, 66], and re-ranking methods [1, 8, 12, 19, 25, 30, 37, 56–59, 61, 67]. The feature extraction methods refer to find reliable and distinctive feature descriptors by extracting the appearance and structure of human body to identify the same pedestrian. The metric learning methods aim to seek an advisable metric model or function to get a more dependable match in disjoint camera views while re-ranking methods attempt to rerank the initial ranking lists by constructing distinctive models. Besides, the re-ranking methods based on ranking results do not depend on the similarity scores, which own more flexibility and popularity [57]. The re-ranking methods can be considered as automatic re-ordering and feedback of initial recognition results in essence. Prosser B. et al. [37] devised to take the person re-ID as a ranking problem, and proposed a recognition metric named Ensemble RankSVM by learning a better subspace which can make the potential positive match ranked at the topmost. According to Prosser’s theory, Leng et al. [19] proposed a similarity measure metric based on content and context similarities and built a bidirectional ranking model to improve the initial ranking list. Similarly, in literature [57], Ye used the k-nearest neighbor set to calculate the similarity and dissimilarity by different baseline methods and then aggregated the similarity and dissimilarity for further to optimize the re-ID performance. The k-nearest neighbors relationships between top-ranked persons can improve the initial result list [19, 57, 67], however, a potential assumption is that if the initial result is arranged in the k-nearest neighborhood of the probe, it is probably to be a positive match that can be valuable for next re-ordering [67]; if the k-top of the probe are all negative matches, then it will lead to a fault result. In other words, the performance of re-ranking relies on the accuracy of the original ranking list to a great extent. Moreover, the original rank list is obtained only in a projection space or with a measure metric, which can reduce the robustness of models. In order to improve re-ID performance and robustness of method, we propose a novel reranking method named RPDM (Re-ranking Perspective Distance Model) as shown in Fig. 2 which employs a candidate set to refine the initial lists and utilizes the PDM to re-rank the candidate set to improve robustness and enhance the probability of finding the correct target from the initial lists. The candidate set is constructed to optimize the initial lists, which are obtained from two measure metrics kLFDA and MFA respectively. The top-G results of each
Fig. 1 Typical examples of persons taken by different cameras. Each column belongs to one person, where huge variations exist due to the changes of light, pose and view point
Multimedia Tools and Applications
list are selected to form a candidate set and thus the candidate set will contain 2G samples, which can make the possibility of finding the correct target be higher. Afterwards, in order to achieve higher performance and own better generalization ability, the PDM is constructed to re-rank the candidate set. This model is based on different distance relationships. Through the model, larger inter-class variation and smaller intra-class variation can be output. In this way, it can well enhance the optimization on top-rank matching in re-ID and the distance model can select more discriminating features to distinguish different pedestrians in high similarity effectively (Shown as 4.4. Model Analysis). The main contributions of this paper are summarized as follows: a) To increase the probability of getting the correct target and the robustness, the candidate set is constructed from two measure metrics (kLFDA and MFA). b) To further improve re-ID rate, the PDM model is proposed by the inter-class and intraclass distance constraints, which can make the matching model acquire more stable and discriminative features for recognizing different persons in high similarity. c) Experimental results on seven datasets (Three large datasets and four relatively small datasets) demonstrate that our proposed method can improve the robustness and performance of pedestrian re-ID effectively. Especially, for these relatively small datasets, the rate at rank-1 is better than other state-of-the-art methods. The rest of the paper is organized as follows. First, the related work of pedestrian re-ID is introduced in Section 2. Section 3 discusses the method proposed by this paper in detail. In Section 4, our method is compared with several state-of-the-art methods on seven public datasets. In addition, the analysis of effects of some parameters on the model performance is also shown in this Section. Finally, Section 5 summarizes the whole paper briefly.
2 Related work Many recent works have proposed lots of methods to solve the problem of pedestrian re-id. Most of them can be classified as: (1) feature extraction methods; (2) metric learning methods; (3) re-ranking methods. In this section, we briefly introduce these three kinds of methods.
Fig. 2 Framework of RPDM
Multimedia Tools and Applications
Feature extraction methods. Feature extraction methods attempt to find reliable and distinctive descriptors among different cameras. The prevalent methods toward person reID works mainly make fully use of appearance characteristics, among which color is regarded as the most often utilized element and often programmed into histograms [17, 41, 64]. For instance, Kuo et al. [39] adopted semantic color names to represent pedestrian images and constructed an appearance affinity model to combine color information, which can well improve the performance. Similar to literate [39], Shet et al. [54] proposed a method by utilizing salient color names to construct a color descriptor, which is most similar to the inherent color. Besides, by extracting the LAB color histograms from image patches and combining them with the dense SIFT descriptor, Zhao [63] proposed a reliable descriptor to make them more discriminative. However, the number of feature dimension is large, which lead to the slow matching process. Additionally, texture-based characteristics are also employed. Lisanti G. et al. [24] proposed a weighted histogram of overlapping stripes (WHOS) descriptor based on color and texture. For each image, color features and Hue-Saturation histograms for each horizontal stripe are extracted and re-identified by a re-weighting iterative process. Each pixel is weighted by an Epanechnikov kernel centering on the image in order to segment the foreground and avoid the influence of background. Experiments show that WHOS has good performance to deal with the influence of elements like perspectives, illumination and variation of pedestrian poses. Lisanti et al. proposed a descriptor by using color and texture, named local maximal occurrence (LOMO) [23], in which the color feature HSV histograms and texture feature Invariant Local Ternary Pattern (ILTP) are extracted and combined from overlapping image patches and maximized the horizontal occurrence of local information to construct LOMO descriptor. In order to handle light variations, Retinex utilize a scale invariant texture operator. Inspired by the covariance descriptor, Gaussian of Gaussians (GOG) descriptor was proposed by Matsukawa T. [32] by using Gaussian distribution model. Each image is divided into several horizontal strips and the corresponding local patches by Gaussian distribution. Each strip is then taken as a group of such Gaussian distributions, which is then summarized by a single Gaussian distribution. Experiments reveal that the performance of GOG is better than LOMO on VIPeR [57] and CUHK01 [64] re-id datasets, but the speed of extraction is slower than that of LOMO. Metric learning methods. Metric learning has been proved effectiveness in many aspects [48, 49] and also is another important method for re-ID. The main purpose is to learn a suitable and discriminative metric model for recognizing persons from disjoint views [6, 16, 47, 52, 55, 66]. Several methods have been proposed. The common methods are according to the distance between intra-class and inter-class to build metric model. For instance, Zheng [66] devised a logistic metric learning as a probabilistic relative distance comparison (PRDC) model aiming to reduce the distance of positive pairs and increase the distance of negative pairs. A similar distance relationship was adopted in KISS metric (KISSME) learning model [14]. In this model, the distance metric was learned according to the equivalence constrains of both positive pairs and negative pairs from different views. However, the performance of KISSME is not good when we employ it to deal with high dimensional features. Successful application based on multiple metrics can be found in various aspects [8, 26–30]. A self-trained discriminative multi-subspace model [56] was proposed by constructing Pseudo pairwise relationships between labeled data and
Multimedia Tools and Applications
unlabeled data, which can find the pedestrian targets from disjoint camera views and advance its performance over state-of-the-art models on VIPeR and iLIDS [11] datasets. In addition, other successful metric learning models were also proposed. For example, based on the Bayesian sparse recovery theory, a relevance subject machine (RSM) model was proposed in literature [12] to implement the multi-shot pedestrian re-ID. Lisanti G. et al. [25] applied logistic regression to learn weights of projection spaces so that reliable distinguishing features can contribute to the re-ID. Re-ranking methods. Re-ranking methods have been widely applied in re-ID to improve object retrieval accuracy. Bai et al. [1] employed the original pairwise distance to construct a sparse contextual activation (SCA) to represent the feature vector. Then pedestrian images can be re-ranked by comparing the distance of feature vector with that of the gallery person under the Jaccard metric. Xie et al. [50] introduced a probe-specific re-ranking (PSR) framework to optimize the initial result list measured by an adaptive metric learning model. Moreover, some works focus on the similarity relationships of knearest neighbors to solve the re-ranking problem. To avoid the influence of false matches on the top-ranked, the k-reciprocal nearest neighbor is introduced in [1, 38]. The contextual dissimilarity model is established by literature [1], which tries to optimize the similarity by computing the average distance of each image with its neighborhood. Qin et al. [38] proposed the k-reciprocal nearest neighbors which are composed of highly relevant candidates and employed them to build similarity set for re-ranking the remainder of gallery. To some extent, all these methods could achieve the goal of enhancing performance. However, as for feature extracting, finding a feature descriptor, which is not only invariant to all kinds of cross-camera variations, but also sufficiently distinctive for visual ambiguity, is extremely difficult. For matching metrics or re-ranking methods, most of them only learn one unitary distance metric for features in the former methods or the original rank list is obtained only in a projection space or with a measure metric in the later methods. These single learning structures are not robust against the complex nonlinear data structure in person re-id, thus their discriminant power might be reduced. In this paper, a candidate set is established to form the initial result list obtained from two different measure metrics in our re-ranking framework to enhance the probability of getting the correct target. In addition, to further improve the recognition performance, PDM is constructed depending on distance constraints to re-identify different pedestrians with high similarity.
3 Proposed method At present, the initial list in most existing person re-ID methods is generated from a projection space or a measure metric, which cannot achieve a considerable performance and may lead to unsatisfactory reliability. If the top-k results of the initial list do not contain target pedestrians, the re-ordering will become meaningless. In order to solve this problem and improve the reliability of the initial list, we employ kLFDA and MFA measure metrics [51] to generate two initial lists and select top-G results from each initial list to form the optimal initial result as a candidate set. Due to the characteristics of features are highly nonlinear in geometric distribution. Common dimension reduction methods, such as principal component analysis (PCA) [36] and linear
Multimedia Tools and Applications
discriminant analysis [53] (LDA), tend to ignore the local geometric structure of features. As for kLFDA and MFA, they are both constructed by scatter function. Especially, kLFDA introduces the local information of intra-class by parameter weight. The local information of inter-class is introduced in the MFA [51]. It is obvious that kLFDA and MFA can not only grasp the local details well, but also make the local information of intra class and local information of inter class in complementary relation, which can mine stable and discriminative feature from different feature spaces. Therefore, in this paper kLFDA and MFA metrics are used to generate initial lists. In this way, important pedestrian details can be supplemented from different projection spaces and the reliability and robustness of initial results can be improved.
3.1 LFDA and MFA spaces LFDA and MFA metrics are described as follows [51]. Let fxi ; ci gNi be the set of pedestrian feature, where xi denotes the ith pedestrian feature, ci ∈ {1, 2, …, C} is the corresponding pedestrian’s identity label, C is the number of identities, and N is the number of training samples. Both LFDA and MFA metrics utilize space projection matrixes (MLFDA and MMFA) to minimize the distance of intra-class and maximize the distance of inter-class for similar pedestrians. The same discriminant objective function is shown as follows: −1 T T ð1Þ M SBM MLFDA ð MMFA Þ ¼ arg max tr M S W M M
Where tr(∙) denotes the trace of matrix. The intra-class scatter matrix S W and the inter-class scatter matrix S B can be computed by:
W
SW ¼
T 1 N W ∑ A xi −x j xi −x j 2 ij ij
ð2Þ
SB ¼
T 1 N B ∑ A xi −x j xi −x j 2 ij ij
ð3Þ
B
Aij and Aij represent weight parameters of intra-class and inter-class respectively, which are the main factors to make LFDA and MFA different. In LFDA, 8 < Aij W if ci ¼ c j Aij ¼ : N c0 if ci ≠c j 8 1 1 > < Aij − B N Nc Aij ¼ 1 > : N 8 >
: 0; otherwise
ð6Þ
Multimedia Tools and Applications
Nc denotes the number of class c and ε is a positive number. Besides, x j ∈N ðxi Þ represents that xj is the near neighbor of xi. In MFA, 8 < 1; if x j ∈N ðxi Þ or xi ∈N x j ; and ci ¼ c j ð7Þ AijW ¼ : 0; otherwise 8 < 1; if x j ∈N ðxi Þ or xi ∈N x j ; and ci ≠c j ð8Þ AijB ¼ : 0; otherwise From Eq. (4), we can see that LFDA introduces local information of intra-class and minimizes intra-class scatter, which can effectively maintain the local geometric structure of sample data. However, LFDA does not take the neighbor relationships between different classes into consideration. As a result, the importance of the boundary discriminating information of inter-class is ignored. In contrast, MFA detects the local discriminant structure between adjacent points of inter-class (Eq. (8)), which can capture boundary discriminant information of inter-class effectively. In other words, LFDA performs better at grasping local information of intra-class while MFA is better at getting information of inter-class. That is why the two metrics are used to mine stable and discriminative information of intra-class and inter-class. In addition, kLFDA is the nonlinear version of LFDA, which can further enhance the recognition performance [51]. In the kernel space, the objective function could be denoted as: −1 ^^S W K ^^Mk ^^S B K ^^Mk ð9Þ Mk T K MkLFDA ¼ arg max tr Mk T K Mk
^ denotes the reproducing kernel of Hilbert space, the kernel function is chosen as Where K Gaussian kernel:
!
xi −x j 2 ^^ xi ; x j ¼ exp − ð10Þ K 2γ Where γ > 0. The objective function can be solved by generalized characteristic equations. The specific solution process can be referred to the literature [35].
3.2 Candidate set After the projection of kLFDA and MFA, the original feature is embedded into the lowdimensional feature space. For the sake of refining the range in the second search, Mahalanobis distance D(xi, xj) is calculated between feature samples in kLFDA and MFA spaces and the target pedestrian, respectively. 0 T D xi ; x j ¼ xi −x j M xi −x j ð11Þ Where M′ denotes MkLFDA or MMFA. According to these two ranking lists, the top-G candidate results are selected in each initial ranking list, which are the most similar to the target pedestrian from each metric, to form a candidate set with 2G samples.
Multimedia Tools and Applications
This process eliminates the pedestrian images that are in low similarity to the target pedestrian and also provides a smaller search gallery for the PDM on the re-ranking stage. Thus, the search difficulty is reduced and the robustness is improved as well.
3.3 Perspective distance model After the first time ranking, the candidate targets obtained have a similarity in higher degree. As for person re-ID, the ideal situation is that the query image just appears at the top ranking of the matched gallery images, which implies that the distance between any matched gallery sample and the query image is much smaller in comparison with that between any unmatched one and the query image. Accordingly, PDM is proposed to deal with different pedestrians with high similarity and obtain a better ranking through different distance constraints of intraclass and inter-class from different sample pairs. On one hand, the distance within a right match pair is compared with the minimum distance within each associated false match pair. On the other hand, the distance within a right match pair is also compared with the minimum each irrelevant false match pair. By this model, the distance of positive pair can be further reduced and the distance of negative pair can be increased. For each sample xi, the following comparison is conducted: D xi ; x j þ σ < minyk ≠yi Dðxi ; xk Þ; yi ¼ y j ð12Þ D xi ; x j þ σ < minyk ≠yl ≠yi Dðxk ; xl Þ; yi ¼ y j ; where yi is the corresponding identity label of xi, and yi = yj means xi and xj are the same person; σ is a slack parameter that quantifies the comparison in the above condition, and it is set as 1.5. In this way, the minimization of a hinge loss function can be realized as follows: min∑xi; x j; yi ¼y j max D xi ; x j −minyk ≠yi Dðxi ; xk Þ þ σ; 0 þmin∑ x ; x ; x ; x max D xi ; x j −minyk ≠yl ≠yi Dðxk ; xl Þ þ σ; 0 ð13Þ i j k l yi ¼ y j ; yk ≠yl ≠yi Note that the hinge loss function is caused by the positive pairs that do have smaller distances in comparison with the minimum distance between negative pairs. The above analysis only applies to inter-class separation. As for the problem of intra-class variation, minimizing the distance between the samples of the same class is to enforce the relation of positive pairs, i.e., minxi ;x j ;y j ¼yi D xi ; x j
ð14Þ
And the objective function can be formulated as follows: LðDÞ ¼ ð1−λÞ∑xi ;x j ;y j ¼yi D xi ; x j þλ∑xi ;x j ;y j ¼yi max D xi ; x j −minyk ≠yi Dðxi ; xk Þ þ σ; 0 þλ∑ x ; x ; x ; x max D xi ; x j −minyk ≠yl ≠yi Dðxk ; xl Þ þ σ; 0 i
j
k
l
yi ¼ y j ; yk ≠yl ≠yi
where λϵ[0, 1] is a weight parameter.
ð15Þ
Multimedia Tools and Applications
3.4 Optimization For the sake of simplification, we use Xi, j to represent the outer product of pairwise differences. T Xi; j ¼ xi −x j xi −x j ð16Þ On the basis of Eq. (16), D(xi, xj) can be rewritten as D(xi, xj) = tr(MXi, j). Similarly, Eq.(15) can be treated likewise, i.e., LðDÞ ¼ ð1−λÞ
∑
xi ;x j ;y j ¼yi
tr MXi; j
þλ∑xi ;x j ;y j ¼yi max tr MXi; j −minyk ≠yi tr MXi;k þ σ; 0 þ λ
∑
xi ; x j ; xk ; xl yi ¼ y j ; yk ≠yl ≠yi
max tr MXi; j − min tr MXl;k þ σ; 0 yk ≠yl ≠yi
ð17Þ In the proposed model, stochastic gradient descent projection method is applied to calculate matrix M in Eq. (17). In addition, we adopt the method in literature [60] to solve M considering that Eq. (18) is piecewise linear regarding M. At the t-th iteration, assume that M = Mt, and the initialization matrix M0 = I. Furthermore, the stochastic gradient at step t can be calculated by: ∂L ∂Mt ∂L ¼ ð1−λÞ∑i; j Xi; j þ λ∑i; j;l;k 2Xi; j −Xi;k −Xl;k ∂Mt Mtþ1 ¼ Mt −α
ð18Þ
The optimization of Eq. (18) is mandatorily required to meet the limitation that Mt + 1 should always be positive semi-definite (PSD). Therefore, Mt + 1 is projected onto the cone of all positive value after each iteration is finished. Specifically, we perform t-th eigen-decomposition as follows: Mtþ1 ¼V tþ1 Qtþ1 V Ttþ1 ;
ð19Þ
Note that the aim of utilizing the matrix and preserving positive eigenvalue is to renew the diagonal matrix Qt + 1.
4 Experiments In this section, we introduce feature extraction, datasets, evaluation protocol and parameter setting, model analysis and performance comparison. To verify the effectiveness of our method, we compared RPDM with several state-of-the-art methods on seven public re-ID datasets, i.e., VIPeR [57], CUHK01 [64], Prid2011 [31], iLIDS [11], CUHK03 [42], Market-1501 [5] and DukeReID [13] Besides, for a quantitative evaluation, the widely-used cumulative match curve (CMC) and mean average precision (mAP) [10] are employed in this paper.
Multimedia Tools and Applications
4.1 Feature extraction In this paper, color features and LBP feature are used to describe pedestrians [66]. Each pedestrian image is divided into 18 horizontal bands on average. For each of the four color spaces (RGB, HSV, YCbCr and Lab), 16-dimensional histogram features are extracted for each color channel (R, G, B, H, S, V, etc.), forming 3456-dimensional color characteristics. At the same time, in order to better describe local features, two LBP features are extracted, namely, radius (1 or 2) and number of neighborhood points (8 or 16), forming a total of 2580dimensional features. Finally, color characteristics and LBP features form 6036-dimension pedestrian characteristics.
4.2 Datasets The proposed method is evaluated on seven public re-ID datasets, i.e., VIPeR, CUHK01, Prid2011, iLIDS, CUHK03, Market-1501 and DukeReID. These seven datasets reflect elements that influence person re-ID in real world, such as perspectives, variation of light, poses of pedestrian, occasions and so on. Table 1 summarizes the statistical information of each dataset. In addition, Fig. 3 shows some samples from the seven datasets. VIPeR: VIPeR is one of the most commonly-used dataset, which contains 632 different pedestrian pairs. Each pair is captured by different cameras in an outdoor academic setting with two images for each pedestrian taken from different viewpoints. Significant viewpoint change, pose variation, and illumination variation make it become one of the most challenging datasets. CUHK01: CUHK01 contains 971 pedestrians with 3884 images captured from two cameras in a college campus setting. Camera A mainly captures one pedestrian’s front and back, while camera B captures the side of corresponding pedestrians. Each pedestrian has two images from each camera’s view. Prid2011: Prid2011 is composed of 1134 images recorded from two cameras, which capture 385 persons and 749 persons, respectively. Only 200 persons appear in the view of both cameras. This dataset not only suffers from viewpoint and illumination variation, but also includes distractors. iLIDS: iLIDS dataset consists of 479 images from 119 pedestrians. Those pedestrians are captured in the arrival hall of a busy airport. Each individual has 2-8 images. Since this dataset is collected in a crowed place, most images have severe occlusions and illumination changes.
Table 1 Characteristics of re-ID datasets Dataset
No. of persons
No. of images
No. of distractors
No. of cameras
VIPeR CUHK01 Prid2011 iLIDS CUHK03 Market1501 DukeReID
632 971 200 119 1467 1501 1404
1264 3884 849 479 13,164 32,217 34,183
0 0 649 0 0 2793 2195
2 2 2 2 6 6 8
Multimedia Tools and Applications
Fig. 3 Samples of the state-of-the-art seven person re-ID datasets
CUHK03: The CUHK03 dataset contains 13,164 images of 1467 pedestrians captured from six cameras. Each identity only appears in two disjoint camera views in the CUHK campus with 4.8 images in each view on average. Market-1501: The Market-1501 is composed of six camera views, which contains 32,217 images of 1501 identities. The resolution of these camera views is different: five high-resolution cameras, and one low-resolution camera. DukeReID: DukeReID is a large-scale heavily labeled dataset, which contains 34,183 images of 1404 identities appearing in more than two cameras. The width and height of the image as well as the aspect ratio are varying greatly in this dataset.
4.3 Evaluation protocol and parameter setting Our experiments followed the evaluation protocol in [64]. Half of the samples are selected for training at random, and the rest are used for testing. In order to get CMC values, the testing set is divided into two kind of sets, a gallery set and a probe set [64]. As single-query can better validate performance of method than multiple-query, all results in this paper are based on single-query [10]. Repeating ten times trials of evaluation is to obtain stable statistics and the average result is reported as section 4.4 and 4.5. Moreover, each image in every dataset is resized to 128 × 48 pixels with the common parameters of the descriptor. The parameter G represents the number of images obtained in each initial list. Figure 4a shows the variation of re-ID rate at rank-1 when 2G is in the range of 2-60 on the seven datasets. It is obvious that on four relatively small datasets (VIPeR, CUHK01, Prid2011, iLIDS), when 2G is from 20 to 30, the re-ID rates of rank1 remain high and relatively stable. In addition, on three large datasets (CUHK03, Market1501, DukeReID), when 2G < 48, with the increase of 2G, the re-ID rate of rank-1 is increasing gradually. When 2G is selected in the range of 48 to 54, the rank1’s re-ID performance is good and stable. The main reason for these difference is that on large datasets, the number of similar images are more compared with smaller datasets. Therefore, the algorithm needs more images from initial lists to get the correct target with greater probability. So on the small datasets (VIPeR, CUHK01, Prid2011, iLIDS), G is set to 10. On the large datasets (CUHK03, Market1501, DukeReID), G is set to 25. Parameter λ represents the degree of influence of the second and third items in PDM model (Eq. (17)). Figure 4b shows the variation of re-ID rate at rank-1 on seven datasets when λ changes from 0.1 to 1. If λ >0.5, there will have a restraining effect on other constraints. But if
Multimedia Tools and Applications
(a)
(b)
(c)
Fig. 4 a Variation of Re-ID rate at rank-1 when 2G changes from 2 to 60 in seven datasets. b Variation of re-ID rate at rank-1 when λ changes from 0.1 to 1 on seven datasets. c The value of objective function of PDM versus different times of iterations on the seven dataset
λ