Building an Entity-Centric Stream Filtering Test Collection for TREC ...

Recommend Documents

Building a Filtering Test Collection for TREC 2002 - Semantic Scholar

edges that this contribution was authored or co-authored by a contractor or affiliate of the U.S. Government. As such .... new set of Federal Register documents (FR94) was assem- ... made by assessing the top 100 FBIS documents retrieved by.

Building A Test Collection For Sorani Kurdish

3Project page at http://eng.uok.ac.ir/esmaili/research/klpp/en/main.htm. This paper ... and then used a powerful Desktop Search tool to compile a list of queries.

Building a Nasa Yuwe language test collection

tion of:1) the process of building the Nasa Yuwe test collection, 2) the queries, ... Nasa Yuwe is that of an endangered language, since due to a range of cultural, social, .... potential users, a manual classification compiled by experts and an auto

Experiments on the TREC-8 Filtering Track

Therefore, our hypothesis is that this method should work well in the ltering ... where Cont(w; q; d) is the contribution of the word w in the similarity between query ...

Building a Test Collection for Evaluating Search ... - Semantic Scholar

search engines to retrieve documents covering different re- quirements. Sanderson [2] has surveyed previous research work on ambiguity and the effort taken to ...

A methodology for building a patent test collection ... - Semantic Scholar

and David Santamauro for his support concerning the interpretation of the ... P. Howitt. Competition and innovation: An inverted- u relationship. The Quarterly ...

Building a Test Collection for Evaluating Search ... - Semantic Scholar

more, some examples of the queries with high SI values are. âdreamâ (0.0835), âHerculesâ (0.0509), âDavid Copperfieldâ. (0.0441) and âSaint Mary'sâ (0.0295).

QU at TREC-2015: Building Real-Time Systems for Tweet Filtering ...

INTRODUCTION. Twitter has rapidly developed over the past years and be- ..... there are more sources like Google web search, Wikipedia, and Quora that can ...

QU at TREC-2015: Building Real-Time Systems for Tweet Filtering ...

rent events and news. With a huge flood of tweets shared daily, users are overwhelmed by the amount of data they need to follow in order to track a certain event ...

Personalized Filtering of the Twitter Stream

work in the recent times, well known ones include Facebook, MySpace, ... Facebook, Linkedin). ..... 15 https://developers.facebook.com/docs/reference/api/ ...

Building a Question Answering Test Collection - Semantic Scholar

the string âThe Iron Lady; A Biography of Margaret Thatcher byâ but also ... Alfred A. Knopf, Lady Dorothy Neville, and Samuel Beckett. 4 Evaluating the Question ...

Xerox TREC-5 Site Report: Routing, Filtering, NLP

especializada, oficio, jugador traspasado, intercambiar,industria, hacer un cambio, guia de fabricantes y comerciantes, feria comercial, explotar, entregar como ...

Xerox TREC-5 Site Report: Routing, Filtering, NLP, and ... - CiteSeerX

TREC-4 site report for more details on our work last year 11]. .... While average uninterpolated precision depends heavily on good recall, the ..... bat alien smuggling", the SMART phrase builder will generate \alien combat" and \alien smuggl".

The TREC Filtering Track Final Report Stephen Robertson David A ...

The TREC Filtering Track Final Report. Stephen Robertson .... N o back - tracking or temporary caching of documents is allo w ed. While not alw ays realistic, this.

Searching and Filtering Tweets: CSIRO at the TREC 2012 Microblog ...

1. MICROBLOG TRACK. The Microblog Track was introduced in 2011 to TREC .... mentions reveal the name or nature of the Twitter user, and some are random ...

TREC-11 Experiments at CAS-ICT: Filtering and Web - CiteSeerX

documents and some strategies are proposed for T11U or T11F optimization. ... However, each topic of this year has been changed into a natural language statement or an ..... The anatomy of a large scale hypertextual web search engine.

CatStream: Categorising Tweets for User Profiling and Stream Filtering

Mar 22, 2013 - Twitter allows users to consume a stream of information generated by other users they have opted to follow. Such streams consist of short text ...

Filtering TOBIAS combinatorial test suites

Software testing appears nowadays as one of the major techniques to evaluate .... test case initializes the bu ff er system, adds two elements and removes one of.

A Stream Filtering Architecture for Coherent Ray Tracing - CiteSeerX

Mar 11, 2009 - Currently the z-buffer algorithm (7) and ray tracing (34) .... Coherent or packet-based ray tracing (32) enables the efficient ... SIMD instructions such as SSE or Altivec. ..... tion, and shading (T/I/S) in the path tracer for seconda

Building Envelope Test CELL: development of an

Valentina Serra, Associate Professor. Department of Energy, Politecnico di Torino, Italy, [email protected]. Marco Perino, Full Professor. Department of ...

Building an Interoperability Test System for Electric ... - Semantic Scholar

May 26, 2016 - The TTCN-3 framework runs on a Windows 7 laptop computer (Samsung, Suwon, ... Among the 17 test cases in Part 4, our test system covers 15 test cases, ... in Windows 7 in C and C++, deployed in a laptop computer.

Preservation for building Collection Memory of ...

Jul 5, 2018 - the first semester/2016), with emphasis on the âMemory of Nursing Collectionâ, composed of 107 books, starting with Nicholas Senn's âA nurse's ...

Stream temperature data collection standards for Alaska: Minimum ...

We recommend a 4-point (0, 10, 20, 30 â¦C) calibration by the manufac- turer. It is good practice to ..... Southwest Alaska Network, Fort Collins, CO. Som, N.A. ...

Thermal Response Test for BTES Applications - Building

Thermal response test, In-Situ measurement, BTES, TED-measurement. ABSTRACT ..... Manual, tw o levels. Manual, continuously variable rate. F low ... aster. V olu m etric flow meter none not reported. V olu m etric flow meter. Electrom ag-n.

Building an Entity-Centric Stream Filtering Test Collection for TREC ...

Download PDF

17 downloads 121 Views 1MB Size Report

Comment

KBA differs from previous filtering evaluations in two primary ways: the stream corpus is >100x larger than previous filtering collections, and the use of entities as ...

Building an Entity-Centric Stream Filtering Test Collection for TREC 2012 1 1 1 John R. Frank , Max Kleiman-Weiner , Daniel A. Roberts 2 2 2 Feng Niu , Ce Zhang , Christopher Ré , 3 Ian Soboroff 1 KBA Organizers, Massachusetts Institute of Technology, [email protected] 2 University of Wisconsin-Madison 3 National Institute of Standards and Technology Gaithersburg, MD [email protected] ABSTRACT The Knowledge Base Acceleration track in TREC 2012 focused on a single task: filter a time-ordered corpus for documents that are highly relevant to a predefined list of entities. KBA differs from previous filtering evaluations in two primary ways: the stream corpus is >100x larger than previous filtering collections, and the use of entities as topics enables systems to incorporate structured knowledge bases (KB), such as Wikipedia, as external data sources. A successful KBA system must do more than resolve the meaning of entity mentions by linking documents to the KB: it must also distinguish centrally relevant documents that are worth citing in the entity’s WP article. This combines thinking from natural language processing (NLP) and information retrieval (IR). Filtering tracks in TREC have typically used queries based on topics described by a set of keyword queries or short descriptions, and annotators have generated relevance judgments based on their personal interpretation of the topic. For TREC 2012, we selected a set of filter topics based on Wikipedia entities: 27 people and 2 organizations. Such named entities are more familiar in NLP than IR. We also constructed an entirely new stream corpus spanning 4,973 consecutive hours from October 2011 through April 2012. It contains over 400M documents, which we augmented with named entity classification tagging for the ~40% of the documents identified as English. Each document has a timestamp that places it in the stream. The 29 target entities were mentioned infrequently enough in the corpus that NIST assessors could judge the relevance of most of the mentioning documents (~91%). Judgments for documents from before January 2012 were provided to TREC teams as training data for filtering documents from the remaining hours. Run submissions were evaluated against the assessorgenerated list of citation-worthy documents. We present peak F_1 scores averaged across the entities for all run submissions. High scoring systems used a variety of approaches, including simple name matching, names of related entities from the knowledge base, and support vector machines. Top scoring systems achieved F_1 scores in the high 30s or low 40s depending on score averaging techniques. We discuss key lessons learned at the end of the paper. Categories & Subject Descriptors: H.3.3 [Information Storage and Retrieval]: Information Search and Retrieval – Information Filtering; H.3.m [Information Storage and Retrieval]: Miscellaneous – Test Collections; I.2.7 [Natural Language Processing] Text analysis – Language parsing and understanding General Terms: Experimentation, Measurement 1. Introduction We organized TREC KBA to help accelerate the construction and maintenance of large KBs. KBA builds on ideas from the TAC KBP evaluation [Ji 2011], which has shown that entity linking is an increasingly well understood problem. KBA pushes entity linking algorithms to larger data sizes and pushes beyond coreference resolution to focus on the user task of building a KB. KBA seeks cross-pollination of ideas between NLP and IR. As a filtering task, KBA systems must decide on each document using only the currently available data at that point in the stream. At the start of the evaluation, participants were given “urlname” identifiers of 29 target entities and relevance rating judgments generated by NIST assessors for documents before 2012, i.e.

stream_time.epoch_ticks