1ï¼Introduction: MAC feature. W. H. K. Input Image. CNN. Alexnet, VGG-net, ... convolutional activations. Max. K-dim MAC. Maximum Activations of Convolutions ...
Fusing the Global and Local Deep Representation for Effective Object Retreival Wang Mao public code: https://github.com/wangmaoCS/Fusion_RMAC
Contents • Introduction • Motivation • Method & Results • Conclusion
1, Introduction : image retrieval
Search
Query Image
Database Images
Results
Object retrieval: there is a rectangle to indicate the location of the query object.
1,Introduction: MAC feature
H
CNN K
Input Image
Alexnet, VGG-net, ...
Max
W
convolutional activations
Maximum Activations of Convolutions (MAC) : • simple to extract • low dimension, K=256(Alexnet), 512(VGG-16), 1024(Resnet)
K-dim MAC
1,Introduction: Regional MAC (R-MAC) N
1 H K
W
convolutional activations
extracting MAC from sub-regions (N sub-regions, 2+6+12)
...
sum pooling L2 normalization
N K-dim MAC features
R-MAC :1, information from multi-scale regions 2, low dimension 3, easy to extract
K-dim R-MAC
1,R-MAC based object retrieval: pipeline Step 3: Query Expansion
Step 1: Initial Retrieval
q
Query Image
X
Search
+
qTX
Database Images
Query Initial Results
top-5 results q1
rerank Step 2: Reranking by Localization
Search
q1TX
Localization
Searching on feature map
Top 1k Results
Rerank Results
Final Results
2,Motivation ECCV 2016
Existing work: deep learning for R-MAC representation
SiaMAC
ICLR 2015
ECCV 2016 Deep Retrieval
R-MAC CVPR 2016 Application: Similarity Search, Manifold Diffusion
IJCV 2017
Limitation: 1, target at particular datasets 2, need annotated data & GPUs
Our purpose: 1, optimize the retrieval pipeline, not R-MAC 2, utilize public deep models, without training
2, Motivation: the problem of R-MAC(1/3)
Query Object
Query Image
Query R-MAC: from query object
Database Image
Database R-MAC : from whole image
Asymmetric Comparison: under-estimated similarity in (1) Initial retrieval
2, Motivation: the problem of R-MAC(2/3) Query
The object localization result cropped from the ICLR 2015 paper (R-MAC)
unreliable localization Our results: bad (2) reranking & (3) query expansion
2, Motivation: the problem of R-MAC(3/3)
Query
False result
Only relying local information leads to false results
3, Method: Initial retrieval with global R-MAC Step 1: Initial Retrieval
Query Object
Query Image
Database Image
Query R-MAC: from query object
Database R-MAC : from whole image
Asymmetric Comparasion Query R-MAC: from query image
Symmetric Comparasion
3, Method: Reranking with fusing R-MACs Step 2: Reranking by Localization (R) R-MAC based retrieval:only using local information × Query Object(RMAC_qL) Located Object(RMAC_dL)
Database Image(RMAC_dG)
Query Image(RMAC_qG) [RMAC_qL ; RMAC_qG]
×
The proposed approach: fusing the local and global R-MAC
[RMAC_dL; RMAC_dG]
3, Method: QE with fusing R-MACs Step 3: Query Expansion (QE)
+ Query
top-5 results
QE in R-MAC based approach: 1, merging: q1 = Mean(q + top-5 results) 2, querying: retrieval in the top-1k results
The proposed approach:
retrieval
Search
q1TX reranked Results(1k)
local+global representation
local representation
Concatenating local and global R-MAC feature: Final Results
RMAC_L
->
[RMAC_L ;RMAC_G]
3, Method: Analysis • The proposed approach:
• step 1, Initial retrieval : local R-MAC -> global R-MAC • step 2, Reranking: local R-MAC -> (local + global) R-MAC • step 3, Query expansion: local R-MAC -> (local + global) R-MAC
• Extra computation:
• extracting global R-MAC for the query (less than 1s)
• Extra memory
• concatenation of local and global R-MAC
• reranking on top-1k results • VGG-16 deep net: 512d R-MAC -> 1024d R-MAC, • 512*1000*4B = 2MB
3, Results: experimental setup • Public datasets • • • • •
Oxford5k: 5063 images, 55 queries, 11 landmarks of Oxford Univ Paris6k: 6392 images, 55 queries, 11 landmarks of Paris Flickr 100k: 100k distractor images Oxford105k: Oxford5k + Flickr100k Paris106k: Paris6k + Flickr100k
• Comparison
• ICLR 2015: R-MAC (VGG-16 deep model) • ECCV2016: Sia-RMAC (Fine-tuned VGG-16)
• Evaluation
• mean Average Precision (mAP)
• the area under the precision-recall curve, (0,1] • the more, the better
3, Results: Initial retrieval Replacing the local query R-MAC by the global: improve initial accuracy
Retrieval accuracy (mAP%) on the Oxford, Paris dataset (ICLR2015, VGG-16)
Retrieval accuracy (mAP%) on the Oxford, Paris dataset (ECCV2016, siaMAC)
3, Results: Reranking & Query Expansion The fusion of local and global R-MAC: more comprehensive representation
Retrieval accuracy (mAP%) on the Oxford, Paris dataset (ICLR2015, VGG-16)
Retrieval accuracy (mAP%) on the Oxford, Paris dataset (ECCV2016, siaMAC)
Conclusion • Revised R-MAC
• step 1, Initial retrieval: local R-MAC vs global R-MAC • step 2, Reranking: local R-MAC vs (local+global) R-MAC • step 3, Query expansion: local R-MAC vs (local+global) R-MAC
• Pros
• retrieval accuracy improvement • low extra computation and memory cost
• Cons
• suit for landmark image retrieval, not for generic object image retrieval