Using Negative Information in Search Sauparna Palchowdhury Sukomal - PowerPoint PPT Presentation

Using Negative Information in Search Sauparna Palchowdhury Sukomal Pal Mandar Mitra Indian Statistical Institute 203 B T Road, Kolkata 700108 West Bengal, India February 18, 2011

Introduction

Problem • Verbose queries give users more latitude. • Queries may contain negation , i.e. specifications of what is not wanted. • Search engines use keyword matching rather than query understanding ⇒ keywords from negative portions are also used for matching. • Does retrieval effectiveness improve on removing negation ?

A Verbose Query with Negative Information “I am looking for information about literary works (novels, stories, poetry) that have the partition of India as their subject. Works set in that period, but not having the partition as their central theme, are not of interest. Also irrelevant are historical / non-fiction accounts about the partition. ”

Related Work • An MSN search log showed 10% of 15 million web queries to be longer than 5 words. • Query shortening techniques have been used. • Identifying negation in medical reports. • Sentiment analysis involves finding negative connotations.

Benchmark Collection INEX - Initiative for the Evaluation of XML retrieval. • Corpus - Full-text articles crawled from the Wikipedia. · 2006 corpus : 659,388 documents, 4.6GB. · 2009 corpus : 2.6 million documents, 50.7GB. • Queries - Natural language queries formulated by INEX participants. · 2007, 2008, 2009 query sets total 380 queries.

Sample INEX Query < topic id = ‘ ‘2009080” ct no =‘‘268” > < t i t l e > i n t e r n a t i o n a l game show formats < / t i t l e > < description > I want to know about a l l the game show formats that have adaptations in d i f f e r e n t c o u n t r i e s . < /description > < narrative > Any content d e s c r i b i n g game show formats with i n t e r n a t i o n a l adaptations are r e l e v a n t . National game shows and a r t i c l e s about the players and producers are not i n t e r e s t i n g . < /narrative > < /topic >

Detection and Separation of Negative Information

Positive and Negative Parts of a Query Whole query I am looking for information about literary works (novels, stories, poetry) that have the partition of India as their subject. Works set in that period, but not having the partition as their central theme, are not of interest. Also irrelevant are historical / non-fiction accounts about the partition. Positive part I am looking for information about literary works (novels, stories, poetry) that have the partition of India as their subject. Works set in that period, but not having the partition as their central theme, are not of interest. Negative part Also irrelevant are historical / non-fiction accounts about the partition.

Separation Using a Classifier • A Maximum-Entropy Classfier was trained on manually separated query sets. • Tested on 2008, 2009 sets. Table: Classifier performance. + to - indicates positive sentences wrongly classified as negative (and vice-versa) Test set Accuracy - to + + to - Training set 2008 90.3% 6.8% 3.0% 2007 2009 91.5% 5.4% 3.1% 2007, 2008

Retrieval and Evaluation

• The SMART retrieval engine. • Vector space model. • MAP (Mean Average Precision) is the evaluation metric.

Overall Results Table: Overall MAP. Figures in () show % change w.r.t. Q . INEX year run Q P N 2008 b 0.2586 0.2660 (2.9%) 0.2265 (-1.2%) (44 queries) fb 0.2706 0.2827 (4.5%) 0.2496 (-7.8%) 2009 b 0.2499 0.2642 (5.7%) 0.2348 (-6.0%) (36 queries) fb 0.2504 0.2651 (4.4%) 0.2382 (-4.9%) INEX year run Q P M N M 2008 b 0.2564 0.2624 (2.3%) 0.2397 (-6.5%) (31 queries) fb 0.2638 0.2748 (4.2%) 0.2574 (-2.4%) 2009 b 0.2728 0.2790 (2.3%) 0.2768 (1.5%) (36 queries) fb 0.2814 0.2897 (2.9%) 0.2914 (3.6%)

Per-Query Results Figure: Performance of each query in set P . % change in Average Precision (AP) is plotted for the 44 queries. The change is computed with respect to their counterparts in Q .

Per-Query Results Figure: Performance of each query in set N . % change in AP is plotted for the 44 queries. The change is computed with respect to their counterparts in Q .

Per-Query Results Figure: Comparison of the performance of P M with P .

Conclusion

Limitations • Simplistic approach. • Complicated negative-phrases not dealt with. • A relatively small number of queries had both a positive and negative part. Larger, more varied sets may have provided further insight.

Future Work • Affecting term weights. • Increasing the granularity of the corpus.

Thank you.

Using Negative Information in Search Sauparna Palchowdhury Sukomal - PowerPoint PPT Presentation

Using Negative Information in Search Sauparna Palchowdhury Sukomal Pal Mandar Mitra Indian Statistical Institute 203 B T Road, Kolkata 700108 West Bengal, India February 18, 2011 Introduction Problem Verbose queries give users more

The Negative Marker in Romanian Negative Concord Gianina Iord achioaia Seminar f ur

Search Engines Issues Avi Rappoport Search Tools Consulting Search Issues Enterprise Search

Tabu Search Search Tabu Page 1 Part I Part I Tabu Search Principles Search Principles Tabu

Uninformed Search 2 Informed Search Rest of blind search An informed search strategyone

Informed search algorithms Outline Best-first search Greedy best-first search A *

Foundations of Artificial Intelligence 9. State-Space Search: Tree Search and Graph Search Malte

Negative Equity Taxation and Disclosure October 19, 2009 Negative Equity History: State of Ohio

Signed numbers Goals unsigned numbers - non-negative integers signed numbers - positive/negative

Elastic Search - Aditi Choksi (EW18455) Elastic Search Search engine Distributed

2 EBI Search 3 EBI Search 4 EBI

Balanced Search Trees Binary Search Trees Binary Search Tree Binary Search Tree A binary tree is

Search Algorithms 3 AI Slides (6e) c Lin Zuoquan@PKU 2003-2020 3 1 3 Search Algorithms

Query DB structures Manipulation queries DB search Hits Memory search 2 Standardization of

Search 3 AI Slides (5e) c Lin Zuoquan@PKU 2003-2019 3 1 3 Search 3.1 Problem-solving

Informed Search strategies AIMA sections 3.5, 3.6 Summary Informed Search strategies

Search Overview Introduction to Search Blind Search Techniques Heuristic Search

On Dihedral Group Invariant Boolean Functions (Extended Abstract) Subhamoy Maitra 1 Sumanta

Tools for Symmetric Key Provable Security Mridul Nandi Indian Statistical Institute, Kolkata ASK

Blockcipher-based Authentcated Encryption: How Small Can We Go? Avik Chakraborti (Indian

Optimal three-treatment response-adaptive designs for phase III clinical trials with binary

A generalization of unitaries T. S. S. R. K. Rao StatMath Unit Indian Statistical Institute

Stefano Rovetta University of Genova Department of Computer and Information Sciences ICT for

Thick points for generalized Gaussian fields under different cut-offs Alessandra Cipriani 1 Rajat

FIRE Forum for Information Retrieval Evaluation (for Indian Languages) Mandar Mitra Prasenjit