Predicting query difficulty - IBM Research

Information Retrieval Group & Machine Learning Group

Predicting query difficulty

David Carmel Adam Darlow

IBM Haifa Labs

Shai Fine Elad Yom-Tov

IBM Haifa Labs

What is query difficulty?

Information Retrieval Seminar

IBM Haifa Labs

Why to predict query difficulty? Feedback to the user Improving the search engine effectiveness Additional applications


IBM Haifa Labs

Talk Outline Related work Predicting query difficulty – our approach Experimental Results Applications of query prediction Improving effectiveness Missing content detection Distributed Information Retrieval


IBM Haifa Labs

Related work Clarity score (Cronen-Townsend et al., SIGIR 2002) Reliable Information Access (RIA) workshop (2003): Investigated the reasons for variability between topics and systems. Perform failure analysis of several systems on TREC topics, 10 failure categories were identified. Five categories out of the 10 relates to systems’ failure of identifying all aspects of the topic.

TREC Robust Track 2004, 2005 Rank topics by predicted precision System performance is measured by comparing the ranking of topics based on their actual precision to the ranking of topics based on their predicted precision.

SIGIR’05 workshop on Predicting Query Difficulty www.haifa.il.ibm.com/sigir05-qp/ Information Retrieval Seminar

IBM Haifa Labs

Some topics (from TREC Robust track) Query (Topic’s title)

#Relevant docs

Median - Average Precision (Robust04 participants)

Overseas Tobacco Sales

38

0.067

Most Dangerous Vehicles

35

0.004

International Art Crime

34

0.016

Black Bear Attacks

12

0.067

Implant Dentistry

5

0.466


IBM Haifa Labs

Clues for query difficulty: Query independent features (Moth & Tanguy 2005) Morphological features :

Syntactical features :

- number (#) of words

NBWORDS

-Avg # of coujunctions

CONJ

- Avg word length -Avg # of morphemes per word

LENGTH

-Avg # of prepositions

PREP

-Avg # of suffixed tokens

SUFFIX

-Avg # of proper nouns

PN

Semantic feature :

-Avg # of acronyms

ACRO

-Avg polysemy value

-Avg # of numerical values

NUM

-Avg # of unknown tokens

MORPH

UNKNOWN

-Avg # of perso. Pronouns -Avg syntactic depth -Avg syntatic link span

PP SYNTDEPTH SYNTDIST

SYNSETS

Semantic ambiguity is the only (significant) indicator for query difficulty. (Pearson correlation ~ -0.3)


IBM Haifa Labs

Clues for query difficulty (cont): Result set features S1

Hubble Hubble troubles blamed on scrimping by NASA

Hubble Telescope Achievements

Toil and trouble: NASA needs to repair its public image Simple test would have found flaw in Hubble telescope ' flawless'mission over NASA scrubs launch of discovery

Hubble

Hubble builders got award fees, magazine says Nation in brief Flaw in Hubble telescope Hubble telescope Hubble space telescope placed aboard shuttle

Hubble troubles blamed on scrimping by NASA

S2

Telescope Touchy telescope torments controllers It' s all done with mirrors Future of Keck telescope

Hubble Telescope Achievements

Toil and trouble: NASA repair its public image Full S1needs S2to S3

Great eye sets sights sky high

Simple test would have found flaw in Hubble telescope Simple test would have found flaw in Hubble telescope Nation in brief

Even telescope' s downtime is booked Great eye sets sights sky high

S1 -

' flawless'mission over

Cause of Hubble telescope defect reportedly found

1

0

0 NASA scrubsS2 launch1of discovery

Powerful telescope soars into space aboard shuttle Nation in brief Star mix-up snags telescope focusing Telescope launched

S3

Achievements

5

4

Hubble builders S3got 0award 0fees, magazine 0 says Nation in briefFull

5

Flaw in Hubble telescope

4

0

-

Li Peng issues rules on teaching achievements Vu Oanh attends mobilization bloc conference

Hubble telescope

Paper views pan-arab role, al-asad-salih talks Beijing fel achieves saturated oscillation Press reacts to israeli-plo cairo agreement Good results achieved in oil recovery process

Hubble space telescope placed aboard shuttle

Abulaiti abudurexiti views xinjiang' s economic 70 achievements of mei institute 13 certified NTT develops world' s fastest transistor Six advanced radar achievements of institute 14


Hubble space telescope placed aboard shuttle Cause of Hubble telescope defect reportedly found Flaw in Hubble telescope Flawed mirror hampers Hubble space telescope Touchy telescope torments controllers NASA scrubs launch of discovery Hubble builders got award fees, magazine says

IBM Haifa Labs

Measuring the agreement between experts: The Kappa statistic

Doctor 1 Healthy Doctor 2

Sick

Total

Healthy

a

b

a+b

Sick

c

d

c+d

Total

a+c

b+d

a+b+c+d


a d N 1

a b a c c d b d N N N N a b a c c d b d N N N N

IBM Haifa Labs

Measuring the agreement between queries: The Kappa statistic

Full query

Subquery

Top 10

Other

Total

Top 10

a

b

a+b

Other

c

d

c+d

Total

a+c

b+d

a+b+c+d


IBM Haifa Labs

Measuring kappa using the overlap

Number of overlaps

ID of top 10 documents retrieved: Full query: “Magnetic Levitation Maglev” Sub-query 1: Magnetic Sub-query 2: Levitation Sub-query 3: Maglev Sub-query 4: levitation*magnetic Sub-query 5: levitation*maglev Sub-query 6: magnetic*maglev

77002

39741

76311

39741

6794

50129

47457

47947

39741

77003

35274

75402

39741

77002

67036 12944

87941

33402

47457

1013

22953

17382

1013

43506

89657

35273

69131

9273

47457

77266

17948

3

77402

69775

77002

87941

28369

4

45525

17881

70077

1010

24524

71172

76499

0

47457

30123

87941

16481

60688

89657

20896

22162

4

9391

39741

94709

76311

47457

35273

1013

61496

77002

6

1013

26549

89657

39741

35273

77266

6123

33402

84398

3


35273

IBM Haifa Labs

Block diagram of Query Prediction

Break into keywords and lexical affinities

SE Results

SE Query

Results

Measure overlap

Query prediction can be computed efficiently if we have access into the SE scoring process. Information Retrieval Seminar

Estimate difficulty

IBM Haifa Labs

Supervised learning using overlaps

Features

Canonic Learning algorithm representation Examples

Overlaps

2

2

1

1

3

1

4

5

2

1

4

2

3

2

5

2

5

4 5

3

5

0.21

0

0

1

2

0

1

0

1

0

0

5

1

1

1

1

0

2

0

0

0

0

5

0

0

1

0

1

1

0

1

0

1

1

0

1

0

1

0

0

0

1

0

1

1

1

0

1

0

1

0

2

0

1

4

3

3

Prediction

H


0.8 0.34 0.02 0.83 0.36 0.77 0.02

r

IBM Haifa Labs

Average Histogram for different query types

Overlaps

Overlaps

Hard

Medium


Overlaps

Easy

IBM Haifa Labs

Additional query features useful for prediction

Query terms’ Document frequency Top document score Low top score – a good indication for a difficulty

Number of terms in the query


IBM Haifa Labs

Experiments: The Robust Track task

A collection of approximately 500,000 documents 249 queries, with known relevant documents Each query consists of 3 parts: Title, Description, Narrative

Title: African Civilian Deaths Description: How many civilian non-combatants have been killed in the various civil wars in Africa? Narrative: A relevant document will contain specific casualty information for a given area, country, or region. It will cite numbers of civilian deaths caused directly or indirectly by armed conflict.


IBM Haifa Labs

Query Prediction evaluation Following the Robust track of TREC 2004 we use Kendall’s-tau to measure the distance between the queries sorted by actual difficulty and the queries sorted by the predicted difficulty.

Kendall’s-tau: Measures the distance between two ordered lists, by counting the number of (bubble sort) operations required to sort one list so that it is ordered as the other list. 1 2 3 4

2 3 1 4

5 5

1 2 3 4

2 1 3 4

5 5

1 2 3 4

1 2 3 4

5 5


KT

2 1 5

0 .6

IBM Haifa Labs

Query Prediction results (Kendall-tau scores): Comparison to other methods in the Robust track 2004

Method

Title Query

Description Query

Top score

0.260

0.379

Ave. top 10 scores

0.211

0.294

Standard deviation of the idf

0.110

0.243

Overlap

0.371

0.571

(49 new queries of the TREC 2004 Robust track) Information Retrieval Seminar

IBM Haifa Labs

How the search engine can use Query Prediction

Selective query expansion Modifying the search engine parameters Switch between query parts


IBM Haifa Labs

Results (TREC Robust Track 2004) Run name

MAP

P@10

%no

Description only

0.281

0.473

9.5

Description with AQE

0.284

0.467

11.5

Description with selective AQE

0.285

0.478

9.5

Description with modified parameters

0.282

0.467

9.5

Title only

0.271

0.437

10.0

Title. Description.

0.294

0.484

8.5

Title switch Description

0.295

0.492

6.0


IBM Haifa Labs

Identifying missing content queries: Knowing what you (don’t) know Missing content queries (MCQs) are queries for which there are no relevant documents in the index. Why is identification of MCQs important? For the Search Engine operator: Logging of the information that interests users but that the document collection cannot answer. For the user: Identify queries that cannot be answered by the search engine.

Identification of MCQs is performed using a modified version of the Query Prediction. Information Retrieval Seminar

IBM Haifa Labs

Identifying missing content queries: Proof of concept Goal: distinguish MCQ queries from non-MCQ queries.

Query

Experiment: all relevant documents of 170 queries were deleted from the collection, thus generating MCQs. MCQ predictor was trained for that data (based on overlaps) Results were evaluated using 10-fold cross-validation The obtained ROC area is over 0.34

Original Query predictor is used to filter easy queries The obtained ROC area with prefilter is over 0.9. Information Retrieval Seminar

Query predictor

MCQ predictor

System diagram

Decision

IBM Haifa Labs

Federation Merged ranking

Given several databases that might contain information relevant to a given question,

Federate results

How do we construct a good unified list of answers from all these datasets?

Search engine

dataset


dataset

dataset

IBM Haifa Labs

Federation using Query Prediction Train a predictor for each collection For a given query: Predict its difficulty for each collection Weight the results retrieved from each collection accordingly, and generate the federated list


IBM Haifa Labs

Results of the Federation experiment TREC-8 collection was divided into 4 sub-collections A query predictor was trained for each sub-collection Search results returned from each sub-collection were merged by different merging schemes Single collection

Merge method

P@ 10

%no

FBIS

76 1 .76

51 .8

FR 94

0 .70

75 .1

FT

2 .62

32 .1

LA -Times

2 .56

28 .9

Unweighted

3 .38

20 .9

CORI

3 .47

17 .7

Prediction

3 .63

15 .5


IBM Haifa Labs

Meta-search using Query Prediction

Merged ranking

Part of TREC-8 collection (LA Times) was indexed using four search engines.

Federate results

A predictor was trained for each search engine. For each query the results-set from each search engine was weighted using the prediction for the specific queries. The final ranking is a ranking of the union of the results-sets, weighted by the prediction.

Search engine

Search engine

Collection


Search engine

IBM Haifa Labs

How similar are the desktop search engines?


IBM Haifa Labs

Results of the metasearch experiment

Single search engine

Metasearch

P@10

%no

SE 1

0.139

47.8

SE 2

0.153

43.4

SE 3

0.094

55.2

SE 4

0.171

37.4

Round-robin

0.164

45.0

MetaCrawler

0.163

34.9

Prediction-based

0.183

31.7


IBM Haifa Labs

Summary Query difficulty estimation is: Achievable Low computational cost Provides some insight into what makes a query difficult for search engines Useful both in simple applications (Improving effectiveness) as well as more sophisticated one (MCQ estimation and DIR).

Future work: Improve prediction methods Understand better what makes a query difficult Information Retrieval Seminar

IBM Haifa Labs

Thanks!

any difficult

Questions?


IBM Haifa Labs

Query estimation results (Kendall-tau scores)

Collection TREC 249 queries

WT10G 100 queries

TREC+WT10G 349 queries

Query type

KTMAP

KTP10

Title Description

0.254 0.439

0.253 0.360

Title Description

0.110 0.093

0.155 0.140

Title Description

0.312 0.464

0.291 0.414

Observations: Better results for longer queries More training data do help Information Retrieval Seminar

Predicting query difficulty - IBM Research

Predicting query difficulty - IBM Research

Suggest Documents

Predicting Dynamic Difficulty - CiteSeerX

Query Difficulty Prediction for Contextual Image ... - Research at Google

IBM Research Report Query Indexing with Containment-Encoded ...

Quilt: An XML Query Language for ... - IBM Research-Almaden

Research Report - IBM Research

Research Report - IBM Research

Ibm showcase query manual - Google Drive

Predicting Query Performance by Query-Drift Estimation - Faculty of ...

3 IBM XL UPC compiler - IBM Research

Learning to Estimate Query Difficulty - Elad Yom-Tov

IBM Research Labs Calling Innovators IBM Research - India, located ...

IBM Research Labs Calling Innovators IBM Research - India, located ...

Query difficulty, robustness and selective application of ... - CiteSeerX

Metasearch and Federation using Query Difficulty ... - Semantic Scholar

Predicting Query Performance on the Web

Predicting Query Reformulation During Web Searching

1 Introduction - IBM Research

IBM Research Report

DSF - IBM Research | Haifa

IBM Research Report

Download - IBM Research

IBM Research Report

IBM Research Report

IBM Research Report