Back to the sketch-board: Integrating keyword search, semantics, and - PowerPoint PPT Presentation

Back to the sketch-board: Integrating keyword search, semantics, and information retrieval Joel Azzopardi 1 , Fabio Benedetti 2 , Francesco Guerra 2 , and Mihai Lupu 3 1 University of Malta joel.azzopardi@um.edu.mt 2 Universita di Modena e Reggio Emilia firstname.lastname@unimore.it 3 TU Wien mihai.lupu@tuwien.ac.at 2nd International Conference / IKC 2016 / Cluj-Napoca Romania, 8-9 September 2016

the sketch-board

two directions Start from existing work [KE4IR, Corcoglioniti et al. 2016] 1. experimenting new semantic representations of the data; 2. experimenting different measures for computing the closeness of documents and queries Contributions of this paper  we reproduce the work in KE4IR;  we extend the work by introducing new semantic representations of data and queries;  we change the scoring function from the tf-idf to the BM25 and BM25 variant [Lipani et al. 2016] .

1. new semantic representations  started from a subset of the layers analyzed in KE4IR – only classes and entities referenced in the data  hypothesis: reduce the noise generated by spurious information  extend this set in two ways: 1. adding external classes and entities via PIKES enriched set  2. refine and extent annotations using DBpedia use the textual description in the DBpedia abstract field  apply AlchemyAPI to it to extract additional entities. 

2. text similarity measures  bm25  bm25 variant

bm25 variant [Lipani et al 2016]

combining terms and concepts  Probabilistic Relevance Framework  direct application not possible – terms and concepts do not share the same probability space  calculated a separate S E (q,d) score

combining terms and concepts  Probabilistic Relevance Framework  direct application not possible – terms and concepts do not share the same probability space  calculated a separate S E (q,d) score  combine the two

Experiments 1. Using terms alone comparing traditional BM25 (standard B) with the variation BVA, as well as the baseline in KE4IR; 2. Using terms (as in 1 above) after applying filtering based on concepts; 3. Combining ranking of terms and concepts; and 4. Combining ranking of terms and concepts as in 3 after applying filtering based on concepts. Dataset 331 articles from the yovisto blog. 570 words on average 83 annotations per article, on average 35 queries inspired by search log, manually annotated

text only  Classic BM25 params – k1 = 1.2 – k3 = 0 – b = 0.75

Retrieval using terms and filter on concepts

Retrieval using combined ranking of terms and concepts

Retrieval using combined ranking of terms and concepts, and filter on concepts

Observations  Best results obtained on P@5 and P@10, improving the current state of the art on the provided test collection.  By considering the top-heavy metrics (P@1 and MAP), the experiments show that it is extremely difficult to improve on the existing results.  The increased performance in precision obtained by our technique does not correspond to an increase in the NDCG and MAP scores, thus meaning that a larger number of correct documents is associated to a worst ranking of them.  The main benefit from the adoption of concepts is the filtering of the documents. Results show that in most cases they introduce more noise than utility into the ranking.  Due to the small dataset and number of queries evaluated, the result cannot be generalized out of this domain.  In this particular domain, the variation of BM25 introduced does not improve the scores.

Back to the sketch-board: Integrating keyword search, semantics, and information retrieval Joel Azzopardi 1 , Fabio Benedetti 2 , Francesco Guerra 2 , and Mihai Lupu 3 1 University of Malta joel.azzopardi@um.edu.mt 2 Universita di Modena e Reggio Emilia firstname.lastname@unimore.it 3 TU Wien mihai.lupu@tuwien.ac.at 2nd International Conference / IKC 2016 / Cluj-Napoca Romania, 8-9 September 2016

Back to the sketch-board: Integrating keyword search, semantics, and - PowerPoint PPT Presentation

Back to the sketch-board: Integrating keyword search, semantics, and information retrieval Joel Azzopardi 1 , Fabio Benedetti 2 , Francesco Guerra 2 , and Mihai Lupu 3 1 University of Malta joel.azzopardi@um.edu.mt 2 Universita di Modena e Reggio

Sketching Streams Chris Taylor DoD Overview What-Why Sketch? Sketches Hyper Log Log

Integrating Problem Solving 2020 Integrating Problem Solving 2020 Integrating Problem Solving

A glimpse to sponsored search auctions Maria Serna Fall 2016 AGT-MIRI Sponsored search Keyword

Bayes-Nash Price of Anarchy for GSP Renato Paes Leme va Tardos Cornell University Keyword

Sponsored Search Equilibria for Conservative Bidders Renato Paes Leme va Tardos Cornell

Welcome back... Welcome back... ..to me. Welcome back... ..to me. Test out Welcome back...

Cynthia Gaub North Middle School Everett Washington www.artechtivity.com About Sketch-up State

Review SketchNet: Sketch Classification with Web Images [CVPR `16] (Speaker. Doheon Lee)

Search Engines Issues Avi Rappoport Search Tools Consulting Search Issues Enterprise Search

Search Strategy - I Dr. V. V. Subrahmanyam Associate Professor, SOCIS, IGNOU Search and Search

Tabu Search Search Tabu Page 1 Part I Part I Tabu Search Principles Search Principles Tabu

Uninformed Search 2 Informed Search Rest of blind search An informed search strategyone

Informed search algorithms Outline Best-first search Greedy best-first search A *

Foundations of Artificial Intelligence 9. State-Space Search: Tree Search and Graph Search Malte

Similix Sketch Tool The Similix Sketch Tool is A tool for making easy sketches of future

Sketch Me That Shoe Heechan Shin CS688 Student paper presentation Sketch Me That Shoe (

Homework Watch the SearchEngineLand video: What is SEO

Br Brand/ nd/ Launc unch h Your ur Ca Career Us Using Li LinkedIn An Anna Centrella

LanguageIndependent LanguageIndependent AnswerPrediction AnswerPrediction

Applying the User-over-Ranking Hypothesis to Query Formulation Matthias Hagen Benno Stein

Searchable Symmetric Encryption: Optimal Locality in Linear Space via Two-Dimensional Balanced

Getting The Most From LinkedIn Voltron- Sourcing Highlights From Session 5 Of LinkedIn Xtreme

Data structuring The Pandas way Andreas Bjerre-Nielsen Recap What have we learned about

Fundamentals of Programming C

Sambuz

Useful Links

Newsletter

Mail Us