MatchPlus: A CONTEXT VECTOR SYSTEM FOR DOCUMENT

Recommend Documents

A Context-based Document System for Wearable ...

linked to specific documents, context about the user and the explicit use of time. None of the above systems, however, directly support all three forms of context.

A Context-based Document System for Wearable Computers

A Context-based Document System for Wearable Computers. Kent Lyons. 1. , Thad Starner. 1. , Lonnie Harvel. 2. {kent, thad, ldh}@cc.gatech.edu. 1. College of ...

Vector Space Models for Scientific Document Summarization

Jun 5, 2015 - Vector Space and Language Models for Scientific Document Summarization. John M. .... topic signature model where 10â50 bigrams remain.

Vector Space Models for Scientific Document Summarization

May 31, 2015 - John M. Conroy .... We use the OCCAMS algorithm (Davis et al., 2012) to select sentences ... are nonnegative and W has k columns and H has.

SCALABLE CONTEXT-BASED MOTION VECTOR CODING FOR ...

Email: vvalenti,am,barlaudÂ¡ @i3s.unice.fr. *Dipartimento di Ingegneria Elettronica e delle ... coding scheme for video compression which combines con- strained ...

SCALABLE CONTEXT-BASED MOTION VECTOR CODING FOR ...

ing steps, usefull for transmission on a variable bitrate envi- ronment. They simply transmit the vectors using prediction and entropy coding. Moreover, for each ...

A Context Model for Context-aware System Design towards - CiteSeerX

advantages of object-oriented models for distributed context handling and those of ontology-based models for .... A Context Entity is a lightweight software component ..... Rules can be written in the. Semantic .... custom namespace. When a ...

A Context Model for Context-aware System Design towards - CiteSeerX

The most common design approach for distributed context-aware frameworks is an infrastructure ... A Context Entity is a lightweight software component ..... custom namespace. When a .... Sons, Ltd, Chichester UK (2004) 69-92. 29. Strang, T.

A DOCUMENT MANAGEMENT SYSTEM FOR THE CONSERVATION

A Document Management System (DcMS) for the efficient organization and .... recommendations, both documents also proposed by ICOMOS(1964, 2003), were.

A DOCUMENT MANAGEMENT SYSTEM FOR THE CONSERVATION ...

University of Minho (UMinho), in partnership with the Centre of Computer ..... by using a PHP scripting language, in combination with a MySQL database.

A Medical Document Classification System for Heart

Weka is free software available under the GNU General Public License. The Weka workbench contains a collection of visualization tools and algorithms for data ...

Incremental Context Mining for Adaptive Document Classification

Rey-Long Liu. Dept. of Information Management. Chung-Hua University. HsinChu, Taiwan, R.O.C.. Tel: 886-3-5186530. Email: [email protected]. Yun-Ling Lu.

A Structured Vector Space Model for Word Meaning in Context

sentence meaning. We argue that existing models for this task do not take syntactic structure sufficiently into account. We present a novel structured vector ...

A Structured Vector Space Model for Word Meaning in Context

Semantic spaces are a popular framework for the representation of .... vector for wi correlates with human reading times. ...... structures for answer extraction.

A support vector machine-based context-ranking model for question ...

Oct 26, 2012 - type of the given question and instructs the context-ranking model to re-rank ..... The best1 one serves as the passage score for passages with multiple focus ..... [15] G.G. Lee, J.Y. Seo, S.W. Lee, H.M. Jung, B.H. Cho, C.K. Lee, ...

A Structured Vector Space Model for Word Meaning in Context

sentence meaning. We argue that existing models for this task do not take syntactic structure sufficiently into account. We present a novel structured vector ...

SoRTESum: A Social Context Framework for Single-Document ...

This paper proposes a framework named SoRTESum to take advantage of information from Twitter viz. Diversity and reflection of document content to generate ...

Norma-System: A Legal Document System for Managing Consolidated ...

management systems based on Norma-System software were put together for ..... and heterogeneous databases are best built using finepoint structure models, .... 1. multiauthor, âStudio di fattibilitÃ per la realizzazione del progetto Â«Accesso ...

Language Independent System for Document

Oct 21, 2011 - another language. This explains why these services are at ..... shoguns. 66.900419 tokugawa. 54.892651 edo. 46.252141 samurai. 46.252141.

A Multiple Access System for a Disjunctive Vector ... - Semantic Scholar

Signal-code construction. Probability of denial. Relative group rate. A-channel. Let us denote the number of active users by S, S â©¾ 2. Input at time t. Vectors x. (t).

Improvements of a vector control system for a mini ... - UPCommons

The main objective of this work is to design and implement a vector control system ... vector control system based on the combustion pressure purge and through.

A Method for Verifying a Vector-Based Text Classification System

vector components. Fortunately, the change is usually diminutive; thus, we simply limit the similarity measurement to the set of JDs that exist in both versions.

A model for secure multimedia document database system in a

systems (MMIS) has become a central issue in developing large-scale .... architecture of a secure distributed multimedia document database system. In Section ...

A knowledge-based system approach for a context ... - Semantic Scholar

Sep 6, 2011 - tions with the properties of context-awareness and ad hoc communi- cations. .... The Push or Provisioning Module manages the automatic distri- bution of ... text information about the device and produces a context profile.

MatchPlus: A CONTEXT VECTOR SYSTEM FOR DOCUMENT

Download PDF

0 downloads 0 Views 83KB Size Report

Comment

MatchPlus: A CONTEXT VECTOR SYSTEM FOR. DOCUMENT RETRIEVAL. Stephen L Gallant, Principal Investigator. William R. Caid, Project Manager. HNC ...

MatchPlus: A C O N T E X T DOCUMENT

VECTOR SYSTEM RETRIEVAL

FOR

Stephen L Gallant, Principal Investigator William R. Caid, Project Manager HNC, Inc. 5501 Oberlin Drive San Diego, CA 92121 PROJECT

GOALS

There are two primary goals for Mat&Plus. First we want to incorporate into the system a notion of similarity of use. For example, if a query deals with 'autos' we want to be able to recognize as relevant a document with many mentions of 'cars'.

• Hand-entered queries may be given using a simple syntax. Terms, paragraphs, and documents can comprise a query (all optionally weighted), along with an (optional) Boolean filter. Documents are always returned in order by estimated likelihood of relevance.

Second, we want to apply machine learning algorithms to improve both ad-hoc retrieval and routing performance. Several different algorithms come into play here:

• Documents may be "highlighted" to show hotspots, or areas of maxinmm correspondence with the query.

• a "bootstrap" algorithm develops context vector representations for stems so that similar stems have similar vector representations

• We have implemented routing using neural network learning algorithms. This resulted in a 20-30% improvement compared with the automated ad-hoc system.

• neural network algorithms produce routing queries from initial queries and lists of relevant and nonrelevant documents

• Lists of stems closest to a given stem provide useful and interesting insight into the system's vector representations.

• clustering algorithms help generate a word-sense disambiguation subsystem (being implemented) • neural network algorithms interactively improve adhoc user queries (being implemented) • clustering algorithms can also speed retrieval algorithms using a new "cluster tree" pruning algorithm (planned) A context vector representation is central to all MatchPlus system capabilities. Every word (or stem), document (part), and query is represented by a fixed length vector with about 300 real-valued entries. For any two of these items, we can easily compute a similarity measure by taking a dot product of their respective vectors. This gives a build-in, generalized thesaurus capability to the system. RECENT

RESULTS

• We have built a system for 800,000 documents (2 GB of text). This system takes Tipster topics, automatically generates queries, and performs retrievals. 396

PLANS

FOR

THE

COMING

YEAR

We have been running many bootstrap learning experiments, and some variations have resulted in significant improvements to performance. We expect that this improvement will carry over to all aspects of the system~ including routing. Currently we are implementing word sense disambiguation. We hope that this will give performance improvements, possibly even eliminating the need for phrase processing. This module should also be able to serve as a stand-alone package, providing help for machine translation and speech understanding systems. We plan to apply learning algorithms for automated interactive query improvement in a manner similar to our approach with routing. It seems likely that this will give a significant boost to ad-hoc query performance. Finally, we are performing additional learning experiments to improve routing.