Latent Constrained Correlation Filters for Object Localization

0 downloads 0 Views 6MB Size Report
Jun 7, 2016 - In Eq. (1), xi refers to a K-channel feature/image input which is obtained in a ..... Landmark index:1 Gaussian noise:0.01. MCCF. LCCF. 0. 0.1.
arXiv:1606.02170v1 [cs.CV] 7 Jun 2016

Latent Constrained Correlation Filters for Object Localization

Shangzhen Luan1 , Baochang Zhang1 , Jungong Han2 , Chen Chen3 , Ling Shao2 , Alessandro Perina4 , Linlin Shen5 1

School of Automation Science and Electrical Engineering Beihang University, Beijing, China 2 Northumbria University, Newcastle upon Tyne, NE1 8ST, United Kingdom 3 Center for Research in Computer Vision at University of Central Florida 4 Microsoft Corporation, Redmond, WA, USA 5 Shenzhen University, Shenzhen, China

Abstract. There is a neglected fact in the traditional machine learning methods that the data sampling can actually lead to the solution sampling. We consider this observation to be important because having the solution sampling available makes the variable distribution estimation, which is a problem in many learning-related applications, more tractable. In this paper, we implement this idea on correlation filter, which has attracted much attention in the past few years due to its high performance with a low computational cost. More specifically, we propose a new method, named latent constrained correlation filters (LCCF) by mapping the correlation filters to a given latent subspace, in which we establish a new learning framework that embeds distribution-related constraints into the original problem. We further introduce a subspace based alternating direction method of multipliers (SADMM) to efficiently solve the optimization problem, which is proved to converge at the saddle point. Our approach is successfully applied to two different tasks inclduing eye localization and car detection. Extensive experiments demonstrate that LCCF outperforms the state-of-the-art methods when samples are suffered from noise and occlusion. Keywords: Correlation filter, ADMM, Subspace

1

Introduction

Correlation filter has attracted much attention due to its simplicity and high efficiency. It is usually trained in the frequency domain with the aim of producing a strong correlation peak on the pattern of interest while suppressing the response to the background. To this end, a regression process is usually used to obtain a Gaussian output that is robust against the shifting. The correlation filter method was first proposed by Hester and Casasent, named synthetic discriminant functions (SDF) [1], which focuses more on formulating the theory. To facilitate more practical applications, many variations are proposed including both constrained and unconstrained correlation filters. The former ones refer

2

to minimum average correlation energy (MACE) [2] filters and optimal tradeoff filters (OTF) [3], while the latter ones mainly embrace unconstrained MACE (UMACE) [4] and maximum average correlation height (MACH) filters [5]. Recently, the average of synthetic exact filters (ASEF) [6] and minimum output sum of squared error filters (MOSSE) [7] were introduced to enhance the suitability for real applications. In general, the existing correlation filtering algorithms work pretty well in ideal situations. However, the performance degrades dramatically when dealing with distorted data, such as occlusion, noise, illumination, and shifting. To enhance the system robustness against different distortions, various algorithms have been proposed to address different distortions. For instance, in [18], the patch based correlation filter was introduced to increase the resistance to occlusions. In [17], a nonlinear kernel method was exploited for correlation filter (KCF) to deal with texture variations. Instead of using more sophisticated kernel functions, multi-channel correlation filters (MCCF) [8] take advantage of multi-channel features, such as histogram of oriented gradients (HOG) [9] and scale-invariant feature transform (SIFT) [10], in which each feature responds differently and the outputs are combined in order to achieve high performance. In [12], a new correlation filter was developed to drastically reduce the number of examples in a correlation filter that is affected by boundary effects. Regarding the applications, we can find that correlation filtering is beneficial to object detection, such as eye or car detection. Problem: Given the training samples, the core of correlation filtering is to find optimal filters, which involves unknown variable distribution estimation. Traditional algorithms normally adopt one of the following schemes: 1) finding a single filter (i.e., channel filter [8]) trained by a regression process based on all training samples [8,12], and 2) finding a set of sub-filters (a single filter per image[6]) and eventually integrating them into one filter, where the combination can be either based on averaging the sub-filters in an off-line manner [6] or an on-line iterative updating procedure [17]. According to the literature, the performance of the second scheme is better than that of the first one [17], though it is computationally more expensive. From the variable distribution perspective, the second scheme is equivalent to estimating variable distribution based on only a limited amount of sampled data (e.g., one single image used in ASEF), which fails to consider the variations existed in the data. Another key issue of implementing such an idea is how to efficiently embed these subspace constraints in the optimization process. In this paper, we propose a subspace based alternating direction method of multipliers (SADMM). The classical ADMM is an algorithm that solves convex optimization problems by breaking them into smaller pieces, each of which can be solved easily [19]. However, the original ADMM cannot be directly applied to solve this problem due to its infeasibility of handling the subspace constraint. In contrast, the proposed SADMM is more flexible and proved to converge at the saddle point, therefore enabling a faster algorithm. To sum up, our proposed latent constrained correlation filters (LCCF) based on SADMM differ from the previous approaches in two aspects:

3

ˆ [0:k] that Fig. 1. The framework of LCCF. Base on different sample sets, we get h [k] ˆ is obtained based on a projection Φ. We further solve forms a subspace, where g ˆ [k+1] based on g ˆ [k] h

• To the best of our knowledge, LCCF is the first method that solves the correlation filtering based on a latent subspace constraint. During the training process, the new filters are projected onto subspaces, where a stable correlation filter is eventually regularized. • We apply a new SADMM algorithm to solve our optimization problem which is of high efficiency and feasibility. We prove that the algorithm will theoretically converge at the saddle point. It is worth highlighting that LCCF fully deploys the advantages of the original model including high efficiency and memory saving. In addition, it can yield a better result based on a subspace constraint. Notation: In this paper, scalars are represented by Italic letters (e.g., B), vectors are presented in lowercase boldface (e.g., x), and matrices and supervectors are in upper case boldface (e.g., X). Signˆrepresents the Fourier form of ˆ is the Fourier form of h). T is the transpose operator of matrix. a variable (e.g., h The operator diag converts a D dimensional vector into a D × D dimensional matrix with its diagonal elements being the original vector. The subscript i represents the ith element in a data set (i.e., xi refers to the ith sample in a training set or a test set). The superscript in this paper refers to the iteration ˆ [k] denotes the variable h ˆ in the k th iteration). index of a variable (i.e., h

2

The review of correlation filters

The solution to correlation filters, i.e., multiple-channel correlation filter, can be regarded as an optimization problem which minimizes an objective function E (h). Such a procedure can be described by the following objective function: N

E(h) =

K

K

k=1

k=1

X 1X 1X ||yi − hk T ⊗ xi [k] ||22 + ||hk ||22 , 2 i=1 2

(1)

4

where N represents the number of images in the training set, and K is the total number of channels. In Eq. (1), xi refers to a K-channel feature/image input which is obtained in a texture feature extraction process, while yi is a given response whose peak is located at the target of interest. k denotes the k th channel. The single-channel response is a D dimensional vector, yi = [y[1] , ..., y[D] ]T ∈ RD . Both xi and h are K × D dimensional super-vectors that refer to multichannel image and filter, respectively. Correlation filters are usually constructed in the frequency domain. Therefore, to solve Eq. (1) efficiently, we transform the original problem into the frequency domain by the Fast Fourier Transform (FFT). Then, Eq. (1) becomes: K

K

k=1

k=1

N

ˆ = E(h)

X λX ˆ 2 1X [k] T ˆ 2 diag(ˆ xi ) h ||ˆ yi − ||hk ||2 , k ||2 + 2 i=1 2

(2)

ˆ x ˆ , and y ˆ refer to the Fourier form of h, x, and y, respectively. If we where h, set ˆ = [h ˆ T , ..., h ˆ T ]T h k k [1] [k] ˆ i = [diag(ˆ X xi )T , ..., diag(ˆ xi )T ],

(3)

Eq. (2) can be further simplified to: N

1X ˆ 22 + λ ||h|| ˆ 22 . ˆ i h|| ||ˆ yi − X 2 i=1 2

(4)

1X ˆ T (ˆ ˆ + 1 λh ˆ T h. ˆ ˆ i h) ˆ i h) (ˆ yi − X yi − X 2 i=1 2

(5)

ˆ = E(h) We rewrite Eq. (4) as N

ˆ = E(h)

ˆ we set the derivative of Eq. (5) to zero and To minimize the above E(h), simplify it, resulting in a solution in the frequency domain: ˆ= h

λI +

N X i=1

!−1 ˆi ˆ Ti X X

N X

ˆ Ti y X ˆi .

(6)

i=1

Here, since Xi is usually a sparse matrix, one can transform solving the KD×KD linear system into solving D independent K × K dimensional linear systems. By doing so, the correlation filters calculation exhibits excellent computational and memory efficiency.

3

Latent constrained correlation filter based on SADMM

In order to exploit the property of solution sampling from the data sampling, we add the subspace constraint to the original optimization problem. That is to

5

say, instead of estimating a real distribution function of any unsolved variable, we can solve the problem based on a subspace containing the sub-solutions. ˆ Specifically, we add a new variable in our optimization problem, which is g ˆ in a specific subspace: h ˆ →g ˆ ∈ S. The goal is representing the mapping of h to explicitly impose the subspace constraints by the cloned variables, although this will inevitably bring extra storage costs due to the replication variable. Our optimization problem can thus be summarized as: ˆ minimize E(h) ˆ ˆ; g ˆ ∈ S. subject to h = g

(7)

In the frequency domain, similar to that of [1], the objective function in Eq. (7) can be expressed as: ˆ = E(h)

1 2

B P i=1

ˆ 2 + σ ||ˆ ˆ 2 ˆ 2 + λ ||h|| ˆ i h|| ||ˆ yi − X 2 2 2 2 g − h|| ,

(8)

where S refers to a well-designed subspace, λ and σ are regularization terms. ˆ which is later defined by the function Φ. ˆ is recovered by S built from h, g Unlike classical alternating direction method of multipliers (ADMM) that uses partial updates for the dual variables in the rigorous process, our problem ˆ is defined based on a subspace S spanned is more difficult to solve given that g ˆ The solution of Eq. (8) becomes complex due to the new constraint. To by h. solve this problem, we propose a subspace based ADMM (SADMM) algorithm, which makes use of an iterative process similar to that in [19]. Specifically, after ˆ is calculated according to a given subspace. This the variable replication, g ˆ [k+1] based on h ˆ [k] and g ˆ [k] in the k th iteration. Next, means that we could find h ˆ [k+1] is we expand the training set by adding a number of training samples. g [0:k] ˆ calculated based on subspace spanned by h , which includes sub-filters from ˆ [0] (initialized) to h ˆ [k] . This iterative process is described as follows: h ˆ [k+1] = argmin E(h|ˆ ˆ g[k] ) h [k+1] [k+1] ˆ ˆ ˆ g = Φ(h , h[0:k] ).

(9)

ˆ [k] ) if adopting ˆ [k+1] = argmin E(ˆ It should be noted that we will have g g |h ˆ are actuˆ and h ADMM [19] to solve our problem. It is known to us that g ally equal, which implies that the subspace does not help to solve our problem. In addition, the theoretical investigation into SADMM shows that the convergence speed of SADMM is two times faster than that of ADMM (compared to that of [19]). To solve Eq. 9, we calculate the partial derivatives of Eq. (8), and thus have: B

X ˆ [k+1] ) X ∂E(h ˆ [k+1] − ˆTX ˆ i + λI + σI)h ˆ Ti Y ˆ i − σˆ = ( X X g[k] , i ˆ [k+1]T ∂h i=1

(10)

6

ˆ [k+1] , and have: where B is the size of the training set. We come to the result of h ˆ [k+1] = H−1 h

B X

! ˆTY ˆ i + σ [k] g ˆ [k] X i

,

(11)

i=1

where H=

B X

ˆTX ˆ i ) + λI + σ [k] I, (X i

(12)

i=1

ˆ [k+1] is calculated as : and then g ˆ [k+1] , h ˆ [0:k] ) = ˆ [k+1] = Φ(h g

k X 1 ˆ [i] h , wi i=1

(13)

ˆ [k+1] and h ˆ [i] . where wi is the Euclidean distance between h ˆ [k+1] will converge to a saddle point. The pseuAfter several iterations, h docode of our proposed method is sumarized in Algorithm 1. The LCCF is first B initialized based on half of the training samples, and then we add maxiter samples into the training set, which is one kind of data sampling. Subsequently, a set of sub-filters (solution sampling) are calculated, which are further used to constrain our final solution.

Algorithm 1 Solve LCCF using SADMM 1: 2: 3: 4: 5:

Set k = 0, εbest = +∞, η = 0.7 Initialize σ [0] = 0.25 (suggested in [14]) ˆ [0] based on MCCF ˆ [0] and h Initialize g Initialize B, B denotes the size of half of training samples, maxiter = 12 repeat B P ˆ Ti X ˆ i ) + λI + σ [k] I 6: H= (X i=1 B  ˆ [k+1] = H−1 P X ˆ Ti Y ˆ i + σˆ 7: h g[k] i=1

ˆ [k+1] − h ˆ [k] k2 8: ε = kh 9: if ε < η × εbest then 10: σ [k+1] = σ [k] 11: εbest = ε 12: else 13: σ [k+1] = 2σ [k] 14: end if ˆ [k+1] , h ˆ [0:k] ) ˆ [k+1] = Φ(h 15: g 16: k ← k + 1, B ← B + B/maxiter 17: until some stopping criterion, i.e., maximum number of iterations (maxiter =12).

7

4

Experiments

In this section, to evaluate the performance of our proposed method, we carry out experiments in two different applications: eye localization and car detection. We employ a two-dimensional Gaussian function with the same parameter to generate a single channel output whose peak is located at the coordinate of target. All images are normalized before training and testing. The images are power normalized to have a zero-mean and a standard deviation of 1.0. Subset, Subspace and Robustness evaluation: Here, we will first introduce how to create different kinds of subsets for calculating the sub-filers subspace. We have added some noise or occlusions into the training and testing sets in order to show how LCCF can gain robustness by a projection onto a subspace. More specifically, we first select an initialized subset containing half of all training samples (the size is denoted as B), and then generate other subsets B samples into the initialized subset with maxiter representing by adding maxiter ˆ [0] the maximum number of iterations. Base on the initialized subset, we obtain h [k] th ˆ for the k iteration. and other sub-filters in subsequent iteration steps, e.g., h With respect to the robustness evaluation, the basic idea for both applications is to measure the algorithm accuracy when adding Gaussian noise or occlusions into the training and test sets. For both applications, HOG feature is extracted by setting the number of direction gradients to 5, the size of block and cell both to [5,5] as suggested in [3]. 4.1

Eye localization

In the first experiment, we evaluate our method in the application of eye localization, in which we compare our algorithm with several leading correlation filters in the literature including MCCF [8], correlation filters with limited boundary (CFwLB) [12], ASEF [6] and MOSSE [7]. The experiments are randomly tested for ten times. CMU Multi-PIE The CMU Multi-PIE face database is used for this experiment, consisting of 902 frontal faces with neutral expression and normal illumination. We randomly selected 500 images for training and the rest for testing. All images were cropped to have the same size of 128 × 128 pixels with fixed coordinates of the left and right eyes. We train a 128 × 128 filter for the right eye using full face images by following [8]. Similar to ASEF and MOSSE, we define the desired response as a 2D Gaussian function with a spatial variance of 2. Eye localization was performed by correlating the filters over the testing images followed by selecting the peak of the output as the predicted eye location. Results and Analysis: In order to evaluate the performance of our algorithm, we use the so-called fraction of interocular distance, which is defined based on the actual and the predicted positions of the eyes. This distance can be computed as: ||pi − mi ||2 d= , (14) ||ml − mr ||2

8

where pi is the predicted location obtained by our method, and mi is the ground truth of the target of interest, i.e., the coordinates of an eye, ml and mr are the coordinates of binoculus. With the calculated distance d, the next step is to compare it with a threshold τ . If d < τ , the result will be considered as a correct one. We count the number of the corrected instances under this threshold, and compute the ratio over the total number of test samples as the localization rate. The achieved localization rates under different maxiters are shown in Fig. 2. As can be seen from this figure, LCCF obtains the best accuracy when maxiter = 12. Therefore, we used this setting for all the following experiments. Additionally, we also test the convergence of our method when maxiter = 12. It is clear that the performance is monotonically increasing as the incremental iteration numbers, which verifies our proof. Results6of6different6iterations6when6maxiter=12

Results6of6different6parameter6maxiter

1

0.5

0

Localization6rate

Localization6rate

1

maxiter=9 maxiter=10 maxiter=11 maxiter=12 maxiter=13 maxiter=14 maxiter=15 0

0.05

0.1 Threshold

0.15

0.8

0.6 Threshold=0.1 Threshold=0.15 Threshold=0.2

0.2

0.4

0

1

2

3

4

5 6 7 Iterations

8

9 10 11 12

Fig. 2. The localization rates under different maximum iterations for LCCF, and the convergence of our method when fixing maxiter to 12.

We also compare LCCF with MCCF in the robustness evaluation. As shown in Fig. 3, LCCF achieves a much better performance than that of MCCF, especially when severe noises are added. In Fig. 4, we compare LCCF with the state-of-the-arts, demonstrating that LCCF is affected less by noise and occlusion than others. Especially, in the situation when the test set is extremely noisy, LCCF and CFwLB are much better than the others. It can also be noticed that LCCF achieves a much better performance than CFwLB for the occlusion case. In these experiments, all methods are based on the same training and testing sets. For the original one, we randomly choose 500 for training and other 402 for testing. This experiment is repeated for ten times. For the extended one, we add noise or occlusion to each image, therefore generating in total 1000 training images. Regarding to the test images, we randomly select a part of images from the data set to add noise or occlusion, resulting in 402 testing images. The training and testing sets, as well as the source code will be publicly available. Labeled Faces in the Wild (LFW) database In the eye location experiments, we chose the face images in Labeled Faces in the Wild (LFW) database.

9

Gaussian noise 0.01

Gaussian noise:0.05 1

0.5

LCCF MCCF 0

0

0.05

0.1 0.15 Threshold

Localization rate

Localization Rate

1

0.2

0.5

LCCF MCCF 0

0

Gaussian noise:0.1

0.2

1

0.5

LCCF MCCF 0

0.05

0.1 0.15 Threshold

Localization Rate

Localization rate

0.1 0.15 Threshold

Gaussian noise:0.3

1

0

0.05

0.2

0.5

LCCF MCCF 0

0

0.05

0.1 0.15 Threshold

0.2

Fig. 3. Performance comparison between LCCF and MCCF on CMU Multi-PIE.

Slight occlusion test results

No noise and no occlusion

1

Ground Truth

Ground Truth

Ground Truth

ASEF

ASEF

ASEF

MOSSE

MOSSE

MOSSE

MOSSE

MCCF

MCCF

MCCF

MCCF

CFwLB

CFwLB

CFwLB

CFwLB

LCCF

LCCF

LCCF

LCCF

0.5

ASEF MOSSE MCCF

Localization rate

Ground Truth ASEF

Localization rate

1

0.5

ASEF MOSSE MCCF CFwLB

CFwLB

LCCF

LCCF

0

Ground Truth

Ground Truth

ASEF

ASEF

ASEF

MOSSE

MOSSE

MOSSE

MCCF

MCCF

MCCF

MCCF

CFwLB

CFwLB

CFwLB

CFwLB

LCCF

LCCF

LCCF

LCCF

Ground Truth

Ground Truth

Ground Truth

Ground Truth

ASEF

ASEF

ASEF

ASEF

MOSSE

MOSSE

MOSSE

MOSSE

MCCF

MCCF

MCCF

MCCF

CFwLB

CFwLB

CFwLB

CFwLB

LCCF

LCCF

LCCF

LCCF

0.05

0.1 0.15 Threshold

0.2

0

0.05

0.2

1

0.5

ASEF MOSSE MCCF

0.5 ASEF MOSSE MCCF

CFwLB

CFwLB

LCCF

0

0.1 0.15 Threshold

Occlusion and heavy noise

Slight noise test results 1 Localization rate

Ground Truth

MOSSE

Localization rate

Ground Truth ASEF

0

0

0

0.05

0.1 0.15 Threshold

LCCF

0.2

0

0

0.05

0.1 0.15 Threshold

0.2

Fig. 4. The results of LCCF compared to the state-of-the-art correlation filters on CMU Multi-PIE. The variance is varying from 0.05 (slight) to 0.1 (heavy). The first column is illustrated for the original image, the second column is illustrated for occlusion, the third column is illustrated for slight noise, and the fourth column is illustrated for the heavy noise and occlusion.

10

0.3

Localization rate

Localization rate

0.1

0.2 Threshold

0.3

0

0.4

0

Landmark index:1 Gaussian noise:0.05

0.2 Threshold

0.3

0.3

0

0.1

0.2 Threshold

0.3

0

0.1

0.2 Threshold

0.3

0.3

0

0.4

0

0.1

0.2 Threshold

0.3

0

0.1

0.2 Threshold

0.3

0

0.1

0.2 Threshold

0.3

MCCF LCCF

0.5

0

0.4

0

0.1

0.2 Threshold

0.3

0.2 Threshold

0.3

3 Landmark index

4

2

1 MCCF LCCF 0

0.4

1

2

3 Landmark index

4

0.5

0

5

RMSE:gaussian noise 0.05 5

4

3

2

MCCF LCCF 0.4

5

3

Landmark index:5 Gaussian noise:0.05

MCCF LCCF 0.1

2

RMSE: gaussian noise 0.01

1

0

1

MCCF LCCF

0.5

0

1

0

0.4

4

Landmark index:4 Gaussian noise:0.05

0.4

2

Landmark index:5 Gaussian noise:0.01

1

0.5

0

0

0.4

0.5

MCCF LCCF 0.4

The root mean square error

Localization rate

0.2 Threshold

1

Landmark index:3 Gaussian noise:0.05

0.5

0

0.1

MCCF LCCF

1

MCCF LCCF 0.4

0

Landmark index:4 Gaussian noise:0.01

0.5

0

0.4

Localization rate

Localization rate

Localization rate

0.1

MCCF LCCF 0.2 Threshold

0

0.4

1

Landmark index:2 Gaussian noise:0.05

0.5

0.1

0.3

MCCF LCCF

1

0

0.2 Threshold

MCCF LCCF

1

0

0.1

Landmark index:3 Gaussian noise:0.01

0.5

MCCF LCCF 0

0

1

1

0.5

0

0

0.4

The root mean square error

0.2 Threshold

Landmark index:2 Gaussian noise:0.01

Landmark index:1 Gaussian noise:0.01 1

Localization rate

0.1

3

MCCF LCCF

The root mean square error

0

0.5

MCCF LCCF

Localization rate

0

0.4

0.5

Localization rate

0.3

RMSE:gaussian noise 0 4

1

MCCF LCCF

Localization rate

0.2 Threshold

0.5

MCCF LCCF

Localization rate

0.1

Landmark index:5 Gaussian noise:0

1

Localization rate

Localization rate

Localization rate

Localization rate

0.5

MCCF LCCF 0

Landmark index:4 Gaussian noise:0

1

1

0.5

0

Landmark index:3 Gaussian noise:0

Landmark index:2 Gaussian noise:0

Landmark index:1 Gaussian noise:0 1

0

0.1

0.2 Threshold

0.3

MCCF LCCF 0.4

1

1

2

3 Landmark index

4

5

Some visualization results of LCCF (the first row represents corrate samples while the second represent error) Visual Result

Visual Result

Visual Result

Visual Result

Visual Result

Visual Result

Visual Result

Visual Result

Visual Result

Visual Result

Visual Result

Visual Result

Visual Result

Visual Result

Visual Result

Visual Result

Visual Result

Visual Result

Fig. 5. The results of LCCF and MCCF on LFW.

LFW database contains ten thousands of face images, covering different ages, sexes and race of people. The training samples take into account the diversity of lighting, pose, quality, makeup and other factors as well. We randomly choose 1000 face images of 250×250 pixels, in which the division for training and testing is half and half. Fig. 5 shows the predominant robustness of our algorithm. Similar to those in the CMU dataset, the performance difference between our algorithm and the state-of-the-arts is getting larger as the increased intensity of the noise. Considering that LCCF is implemented based on MCCF, in this dataset we only compare the two methods. We fail to run the CFwLB code on this database because it requires the facial points to be at the same positions for all the images. However, in the Car dataset, we provide comparisons of all methods to verify the effectiveness of our method. 4.2

Car detection

The car detection task is similar to that of eye localization. We choose 938 sample images from the MIT Streetscape database [20]. All of these are cropped to 360 × 360 pixels. In the training procedure, HOG feature is used as the input and the peak of the required response is located at the center of the car. We use a 100 × 180 pixels rectangle to extract the car block, and remove the rest of the street. In the test, the input image correlates with the filter to generate the

11

corresponding peak location. Different from eye location experiment, we choose the pixels deviating from the true position in place of the normalized distance for evaluation [8]. The result of this experiment can be seen in Fig. 6.

Carxdetectionxresultsxonxoriginalxtestxset 1

MOSSE

CFwLB

MCCF

ASEF

MOSSE

CFwLB

LCCF

ASEF

MCCF

ASEF

0.8

MCCF

LCCF

MOSSE

MCCF

0.6 ASEF MOSSE MCCF CFwLB LCCF

0.4 0.2

MOSSE

MOSSE

ASEF

MOSSE

CFwLB

LCCF

Detectionxrate

ASEF

0

MCCF

MCCF

ASEF

0

10

20 30 ThresholdxgpixelsP

40

50

CFwLB

CFwLB

LCCF

LCCF

CFwLB

LCCF

Carxdetectionxresultsxonxocclusionxtestxset 1 0.8 ASEF

MCCF

CFwLB

DetectionxRate

MCCF

ASEF

MOSSE

ASEF

MOSSE

MOSSE MCCF

LCCF

LCCF

LCCF

0.6 ASEF MOSSE MCCF CFwLB LCCF

0.4

CFwLB CFwLB

0.2 0

ASEF MOSSE

0

10

MCCF

MOSSE ASEF

MOSSE

ASEF

MCCF

20 30 ThresholdxgPixelsP

40

50

MCCF

Carxdetectionxresultsxonxthexnoisextestxset LCCF

LCCF

CFwLB

1

LCCF

CFwLB

CFwLB

ASEF

MCCF

ASEF

MCCF ASEF

MOSSE

MOSSE

MOSSE

MCCF

DetectionxRate

0.8 0.6 ASEF MOSSE MCCF CFwLB LCCF

0.4

LCCF LCCF

LCCF CFwLB

CFwLB

0.2

CFwLB

0

0

10

20 30 ThresholdxgPixelsP

40

50

Fig. 6. Experimental results of LCCF compared to other correlation filters for car detection. The variance of Gaussian noise is 0.05. The first column is illustrated for the original image, the second column is illustrated for occlusion, and the third column is illustrated for noise.

From Fig. 6, we can observe that the performances of most methods are quite close in the case of no occlusion or no noise. However, LCCF shows much better robustness when the testing data suffer from noise and occlusion. This outstanding performance can be attributed to the optimization effect of the projection to a subspace which contains various kinds of variations. With respect to the complexity, in the testing process, LCCF is very fast since our algorithm only needs element-wise product in the FFT domain. When we train D-dimensional feature vectors with maxiter iteration, LCCF has a time cost of O(N DlogD) for FFT calculation (once per image), which is the same to that of MCCF. The memory cost is O(maxiterKD) for LCCf and O(K 2 D) for MCCF. Considering that maxiter is not large, LCCF is quite efficient on training and testing processes.

12

5

Conclusions

In this paper, we have proposed a latent constrained correlation filters (LCCF) method and introduced a subspace ADMM algorithm to solve the new learning model. The theoretical analysis reveals that the new subspace ADMM is twice faster than the original ADMM in terms of the convergence speed. The experimental results have shown consistent advantages over the state-of-the-arts when applying LCCF to the applications of eye detection and car detection. In the future work, we will apply our algorithm to other applications, i.e., tracking, face recognition and action recognition.

References 1. C. F. Hester, and D. Casasent. Multivariant technique for multiclass pattern recognition. Applied Optics, pp. 1758-1761, 1980. 2. A. Mahalanobis, B. Vijaya Kumar, and D. Casasent. Minimum average correlation energy filters. Applied Optics, pp. 3633-3640, 1987. 3. P. Refregier. Optimal trade-off filters for noise robustness, sharpness of the correlation peak and Horner efficiency. Optics Letters, pp. 829-831, 1991. 4. M. Savvides, and B. Kumar. Efficient design of advanced correlation filters for robust distortion-tolerant face recognition. IEEE Conference on Advanced Video and Signal Based Surveillance 2003, pp. 45-52. 5. A. Mahalanobis, B. Vijaya Kumar, S. Song, S. Sims, and J. Epperson. Unconstrained correlation filters. Applied Optics, pp. 3751-3759, 1994. 6. D. S. Bolme, B. A. Draper, and J. R. Beveridge. Average of Synthetic Exact Filters. IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR) 2009, pp. 2105-2112. 7. D. S. Bolme, J. R. Beveridge, B. A. Draper, and Y. M. Lui. Visual object tracking using adaptive correlation filters. IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2010, pp. 2544-2550. 8. H. K. Galoogahi, T. Sim, and S. Lucey. Multi-channel Correlation Filters. IEEE International Conference on Computer Vision (ICCV) 2013, pp. 3072-3079. 9. N. Dalal, B. Triggs. Histograms of oriented gradients for human detection. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 886-893,2005 10. D. Lowe. Object recognition from local scale-invariant features. IEEE International Conference on Computer Vision (ICCV), pp 11501157, 1999. 11. M. Chang. Guiding semi-supervision with constraint-driven learning. Proc of the Annual Meeting of the Acl, 2007. 12. Hamed Kiani Galoogahi, Terence Sim, Simon Lucey: Correlation filters with limited boundaries. CVPR 2015: 4630-4638 13. G. l. Cabanes, and Y. Bennani. Learning Topological Constraints in SelfOrganizing Map: Springer Berlin Heidelberg, 2010. 14. B. Zhang, A. Perina, V. Murino, and A. Del Bue. Sparse Representation Classification with Manifold Constraints Transfer. IEEE Conference on Computer Vision and Pattern Recognition 2015, pp. 4557-4565. 15. B. V. Kumar, A. Mahalanobis, and R. D. Juday, Correlation pattern recognition: Cambridge University Press, 2005.

13 16. H. Kiani, T. Sim, and S. Lucey. Multi-channel correlation filters for human action recognition. IEEE International Conference on Image Processing (ICIP) 2015,pp. 1485-1489. 17. J. F. Henriques, R. Caseiro, P. Martins, and J. Batista, High speed tracking with kernelized correlation filters, IEEE Trans. Pattern Anal. Mach. Intell., vol.37, no.3, pp.583-596 2015. 18. Ting Liu, Gang Wang and Qingxiong Yang,Real-time part-based visual tracking via adaptive correlation filters, IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2015. 19. S. Boyd, N. Parikh, E. Chu, B. Peleato and J. Eckstein. Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations and Trends in machine learning, 2011. 20. http://cbcl.mit.edu/software-datasets/streetscenes/

Suggest Documents