Fusing the Global and Local Deep Representation for

0 downloads 0 Views 1MB Size Report
1,Introduction: MAC feature. W. H. K. Input Image. CNN. Alexnet, VGG-net, ... convolutional activations. Max. K-dim MAC. Maximum Activations of Convolutions ...
Fusing the Global and Local Deep Representation for Effective Object Retreival Wang Mao public code: https://github.com/wangmaoCS/Fusion_RMAC

Contents • Introduction • Motivation • Method & Results • Conclusion

1, Introduction : image retrieval

Search

Query Image

Database Images

Results

Object retrieval: there is a rectangle to indicate the location of the query object.

1,Introduction: MAC feature

H

CNN K

Input Image

Alexnet, VGG-net, ...

Max

W

convolutional activations

Maximum Activations of Convolutions (MAC) : • simple to extract • low dimension, K=256(Alexnet), 512(VGG-16), 1024(Resnet)

K-dim MAC

1,Introduction: Regional MAC (R-MAC) N

1 H K

W

convolutional activations

extracting MAC from sub-regions (N sub-regions, 2+6+12)

...

sum pooling L2 normalization

N K-dim MAC features

R-MAC :1, information from multi-scale regions 2, low dimension 3, easy to extract

K-dim R-MAC

1,R-MAC based object retrieval: pipeline Step 3: Query Expansion

Step 1: Initial Retrieval

q

Query Image

X

Search

+

qTX

Database Images

Query Initial Results

top-5 results q1

rerank Step 2: Reranking by Localization

Search

q1TX

Localization

Searching on feature map

Top 1k Results

Rerank Results

Final Results

2,Motivation ECCV 2016

Existing work: deep learning for R-MAC representation

SiaMAC

ICLR 2015

ECCV 2016 Deep Retrieval

R-MAC CVPR 2016 Application: Similarity Search, Manifold Diffusion

IJCV 2017

Limitation: 1, target at particular datasets 2, need annotated data & GPUs

Our purpose: 1, optimize the retrieval pipeline, not R-MAC 2, utilize public deep models, without training

2, Motivation: the problem of R-MAC(1/3)

Query Object

Query Image

Query R-MAC: from query object

Database Image

Database R-MAC : from whole image

Asymmetric Comparison: under-estimated similarity in (1) Initial retrieval

2, Motivation: the problem of R-MAC(2/3) Query

The object localization result cropped from the ICLR 2015 paper (R-MAC)

unreliable localization Our results: bad (2) reranking & (3) query expansion

2, Motivation: the problem of R-MAC(3/3)

Query

False result

Only relying local information leads to false results

3, Method: Initial retrieval with global R-MAC Step 1: Initial Retrieval

Query Object

Query Image

Database Image

Query R-MAC: from query object

Database R-MAC : from whole image

Asymmetric Comparasion Query R-MAC: from query image

Symmetric Comparasion

3, Method: Reranking with fusing R-MACs Step 2: Reranking by Localization (R) R-MAC based retrieval:only using local information × Query Object(RMAC_qL) Located Object(RMAC_dL)

Database Image(RMAC_dG)

Query Image(RMAC_qG) [RMAC_qL ; RMAC_qG]

×

The proposed approach: fusing the local and global R-MAC

[RMAC_dL; RMAC_dG]

3, Method: QE with fusing R-MACs Step 3: Query Expansion (QE)

+ Query

top-5 results

QE in R-MAC based approach: 1, merging: q1 = Mean(q + top-5 results) 2, querying: retrieval in the top-1k results

The proposed approach:

retrieval

Search

q1TX reranked Results(1k)

local+global representation

local representation

Concatenating local and global R-MAC feature: Final Results

RMAC_L

->

[RMAC_L ;RMAC_G]

3, Method: Analysis • The proposed approach:

• step 1, Initial retrieval : local R-MAC -> global R-MAC • step 2, Reranking: local R-MAC -> (local + global) R-MAC • step 3, Query expansion: local R-MAC -> (local + global) R-MAC

• Extra computation:

• extracting global R-MAC for the query (less than 1s)

• Extra memory

• concatenation of local and global R-MAC

• reranking on top-1k results • VGG-16 deep net: 512d R-MAC -> 1024d R-MAC, • 512*1000*4B = 2MB

3, Results: experimental setup • Public datasets • • • • •

Oxford5k: 5063 images, 55 queries, 11 landmarks of Oxford Univ Paris6k: 6392 images, 55 queries, 11 landmarks of Paris Flickr 100k: 100k distractor images Oxford105k: Oxford5k + Flickr100k Paris106k: Paris6k + Flickr100k

• Comparison

• ICLR 2015: R-MAC (VGG-16 deep model) • ECCV2016: Sia-RMAC (Fine-tuned VGG-16)

• Evaluation

• mean Average Precision (mAP)

• the area under the precision-recall curve, (0,1] • the more, the better

3, Results: Initial retrieval Replacing the local query R-MAC by the global: improve initial accuracy

Retrieval accuracy (mAP%) on the Oxford, Paris dataset (ICLR2015, VGG-16)

Retrieval accuracy (mAP%) on the Oxford, Paris dataset (ECCV2016, siaMAC)

3, Results: Reranking & Query Expansion The fusion of local and global R-MAC: more comprehensive representation

Retrieval accuracy (mAP%) on the Oxford, Paris dataset (ICLR2015, VGG-16)

Retrieval accuracy (mAP%) on the Oxford, Paris dataset (ECCV2016, siaMAC)

Conclusion • Revised R-MAC

• step 1, Initial retrieval: local R-MAC vs global R-MAC • step 2, Reranking: local R-MAC vs (local+global) R-MAC • step 3, Query expansion: local R-MAC vs (local+global) R-MAC

• Pros

• retrieval accuracy improvement • low extra computation and memory cost

• Cons

• suit for landmark image retrieval, not for generic object image retrieval

Suggest Documents