Semantic Hashing - Google Groups

Recommend Documents

Deep auto-encoder. Word count vector ... Training. â¢ Pretraining: Mini-baches with 100 cases. â¢ Greedy pretrained wi

More Robust Hashing: Cuckoo Hashing with a Stash - Semantic Scholar

Define B(u) to be the number of edges that connect u to nodes at distance iâ1 from v. In other words, B(u) is the number of edges that connect u to the connected ...

Discrete Graph Hashing - Semantic Scholar

matrix trace norm, matrix Frobenius norm, l1 norm, and inner-product operator, respectively. Anchor Graphs. In the discr

Corruption-Localizing Hashing - Google Sites

Mar 8, 2011 - 2 School of Computer Science, University of Electronic Science and ... known paradigms, such as virus sign

Hashing-Pursuit for Anomaly Detection - Semantic Scholar

host and port scanning for compiling lists of un-secure servers to be used as attack ... However, processing data in a streaming fashion for network monitoring that .... application (i.e., source-destination pairs of all DNS traffic), and many others

Entropy based locality sensitive hashing - Semantic Scholar

Qiang Wang, Zhiyuan Guo, Gang Liu, Jun Guo. Beijing University of Posts and Telecommunications. Email:[email protected]. ABSTRACT.

Extendible Hashing

College of Computer & Information Science. Northeastern University ... A hash table is an in-memory data structure t

Graphs, Hypergraphs and Hashing - Semantic Scholar

Mar 3, 2016 - 44â100 Gliwice, Poland, email: [email protected]. 1 ..... For r â {3,4,5} we experimentally determined constants cr, such that if n = crm then the expected number of .... Addison-Wesley, Reading, Mass., 1991. [GS89].

Web caching with consistent hashing - Semantic Scholar

since the amount of storage available is limited, the .... use of the Domain Name System (DNS) to support .... In a second experiment, we mapped 1500 names.

Geometric Distortion-Resilient Image Hashing ... - Semantic Scholar

and (iii) the application scalability of our scheme for content copy tracing and ... including fingerprinting, digital signature, and passive/non-invasive ...

Sparse Semantic Hashing for Efficient Large Scale ... - Google Sites

Nov 7, 2014 - explosive growth of the internet, a huge amount of data have been ... its fast query speed and low storage

21 Jan 2015 Quantum Hashing via Classical Ç«-universal Hashing ...

Jan 21, 2015 - suitable one-way function for quantum digital signature protocol from [9]. ... of words of length k over a finite alphabet Î£. In the paper we let X ...

Deep Residual Hashing

Dec 16, 2016 - 3 IBM Almaden Research Center, Almaden, USA. 4 Computer ... CV] 16 Dec 2016 ..... call Curves varying the code size(16, 32, 48 and 64 bits).

ntHash: recursive nucleotide hashing

Jul 17, 2016 - Bloom filters. Therefore, improving the performance of hashing algo- rithms would have a great impact in a wide range of bioinformatics tools.

differential hashing functions - Irisa

This paper presents regular hashing functions. Used in conjunction with differential computation process, regular hashing functions enable searching time of ...

Deep Image Set Hashing

Oct 1, 2016 - Jie Feng, Svebor Karaman, I-Hong Jhuo, Shih-Fu Chang. Columbia University. New York, NY, USA. {jf2776, sk4089, sc250}@columbia.edu, ...

Hashing on Nonlinear Manifolds

Dec 2, 2014 - For MNIST and CIFAR-10, the whole dataset is split into a test set with 1, 000 samples and a training set with all remaining samples.

Linear Hashing with Separators-A Dynamic Hashing Scheme ...

Linear Hashing with Separators-A Dynamic. Hashing Scheme Achieving One-Access. Retrieval. PER-AKE LARSON. University of Waterloo. A new dynamic ...

Scalable Heterogeneous Translated Hashing - Google Sites

stant or sub-linear query speed and low storage cost, hashing based methods .... tions and problem definitions, we show

Rapid Face Recognition Using Hashing - Google Sites

cal analysis on the recognition rate of the proposed hashing approach. Experiments ... of the images as well as the larg

Quantum Hashing via Classical Ç«-universal Hashing Constructions

Jan 21, 2015 - suitable one-way function for quantum digital signature protocol from [9]. For more in- .... by the rule: Ei(w) is the i-th bit of the code word E(w).

More Robust Hashing: Cuckoo Hashing with a ... - Computer Science

evictions that the insertion procedure can tolerate without generating some sort of failure. We find this variant simplest to handle. Pagh and Rodler [11] show that ...

More Robust Hashing: Cuckoo Hashing with a ... - Computer Science

insertion of an item, requiring an expensive rehashing of all items in the table. .... are independent and fully random and items that are not successfully placed in ... We modify the insertion algorithm in the following way: whenever an ... Note tha

Peacock Hashing: Deterministic and Updatable Hashing for High ...

compact on-chip memory to enable deterministic and fast hash queries. ... to become increasingly slower, and the system performance starts to degrade quickly.

Semantic Hashing - Google Groups

Download PDF

0 downloads 212 Views 876KB Size Report

Comment

Conjugate gradient for fine-tuning. â¢ Reconstructing the code words. â¢ Replacing the stochastic binary values of hid

Semantic Hashing Presented by : Ali Vashaee IFT725 Neural network With Hugo larochelle Hinton, Salakhutdinov 2009

Outline • • • • •

What is semantic hashing Unlabeled data Multilevel autoencoder Results Conclusion

Semantic hashing

Document retrieval • Word-count vector • If we computes document similarity directly in the word-count space, which can be slow for large vocabularies.

Document Retrieval • • • •

N= size of the document corpus V= latent variable size LSA: O(Vlog(N)) with kdtree Semantic Hashing : is not dependent on the size of the document corpus. And linear of the size of the short list that it will produce.

Deep auto-encoder

Word count vector

First layer RBM • ‘‘Constrained Poisson Model” that is used for modeling word-count vectors.

The constrained Poisson model

is the bias of the conditional Poisson model for word i, and bj is the bias of feature j.

RBM

Unrolling Word count vector

Word count vector

Fine Tuning • Conjugate gradient for fine-tuning. • Reconstructing the code words. • Replacing the stochastic binary values of hidden units with probability. • Code layer close to binary . • Deterministic noise to force fine tuning to find the binary codes in top layer.

Activities of 128 bit code

• After fine tuning the array codes are threshold to get binary values.

Training • Pretraining: Mini-baches with 100 cases. • Greedy pretrained with 50 epochs. • The weights were initialized with small random values sampled from a zero-mean normal distribution with variance 0.01. • For fine-tuning : conjugate gradients. • After fine-tuning, the codes were thresholded to produce binary code vectors.

Hashing • Hashing : a shortlist of similar documents in a time that is independent of the size of the document collection and linear.

Semantic Hashing Gaurman et al

Screen clipping taken: 11/28/2012, 6:36 PM

Hamming distance • Very little memory is needed for codes. • Fast to find the hamming distance between binary codes.

Experimental Results

• Using a 2000 word counts vector. • Two text datasets: 20-newsgroups and Reuters corpus Volume I (RCV1-v2). • 2000-500-500 (128 ,30 )

Class structure of documents • Semantic hashing with 128 bit code:

20 Newsgroups

• 3.6 ms to search through 1 million documents using 128-bit codes. • The same search takes 72 ms for 128-dimensional LSA

Reuters RCV2

Image retrieval Torralba et al

• nearest neighbors from a database of 12,900,000 images

Thank you