xml retrieval xml retrieval xml retrieval xml retrieval
play

XML Retrieval XML Retrieval XML Retrieval XML Retrieval DB/IR in - PowerPoint PPT Presentation

XML Retrieval XML Retrieval XML Retrieval XML Retrieval DB/IR in DB/IR in Theory Theory Web in Web in Practice Practice Sihem Amer-Yahia Mariano Consens Yahoo! Research University of Toronto In collaboration with: Ricardo Baeza-Yates


  1. XML Retrieval XML Retrieval XML Retrieval XML Retrieval DB/IR in DB/IR in Theory Theory Web in Web in Practice Practice Sihem Amer-Yahia Mariano Consens Yahoo! Research University of Toronto In collaboration with: Ricardo Baeza-Yates Mounia Lalmas Yahoo! Research Queen Mary, Univ. of London VLDB 2007, Vienna, 26/09/07

  2. Preliminaries Preliminaries � DB focused on languages, expressiveness and efficient evaluation � IR focused on scoring and relevance metrics � In practice, a limited set of operations and p p simple ranking go a long way � Theory is scary (think XQuery) � Practice is inspiring but looks ad-hoc P b l k d h VLDB 2007, Vienna, 26/09/2007 2

  3. Notion of Relevance Notion of Relevance � Data retrieval: � Syntax expresses semantics � Information retrieval: � Ambiguous semantics � Relevance d epends on user and context � There is no “perfect” retrieval system � User assessments to evaluate system effectiveness effectiveness VLDB 2007, Vienna, 26/09/2007 3

  4. Overview Overview � Preliminaries � Web in Practice � Web in Practice � Search in Web 2.0 � Microformats and Mashups � Microformats and Mashups � DB/IR in Theory � Query Languages Query Languages � Retrieval Semantics � Evaluation à la DB (Query Processing) E l ti à l DB (Q P ssi ) � Evaluation à la DB (Relevance Assessments) � Challenges Ch ll VLDB 2007, Vienna, 26/09/2007 4

  5. Web 2.0 (from Web 2.0 (from Wikipedia) Wikipedia) Ri h S t f B Rich Set of Buzzwords d VLDB 2007, Vienna, 26/09/2007 5

  6. (Web) Search is a Basic Necessity (Web) Search is a Basic Necessity A (grossly inadequate) analogy: Toilets and Web 2.0 o ts an W . "Rich societies have developed quite complicated and expensive systems for removing human wastes from houses and cities, usually by dumping them, treated to one degree or another, into subsoils by dumping them, treated to one degree or another, into subsoils or bodies of water." Peter Bane, 2006 VLDB 2007, Vienna, 26/09/2007 6 6

  7. Rich Standard Infrastructure Rich Standard Infrastructure Standard Pipes Standard Pipes XML VLDB 2007, Vienna, 26/09/2007 7

  8. Big Infrastructure Sites Big Infrastructure Sites Water Treatment Plants Search Engines Search Engines Portals VLDB 2007, Vienna, 26/09/2007 8

  9. Community Sites Community Sites VLDB 2007, Vienna, 26/09/2007 9

  10. The Importance of Mobility The Importance of Mobility The need to carry around technological solutions to h l i l l i basic necessities VLDB 2007, Vienna, 26/09/2007 10 10

  11. Most C Mo st Commonly mmonly Us Used ed is is … … Squat toilet “most popular searches” (2-3 keywords) There are simple and sophisticated solutions to h l d h d l basic necessities N Need for more sophisticated search d f hi ti t d h VLDB 2007, Vienna, 26/09/2007 11 11

  12. Overview Overview � Preliminaries � Web in Practice � Web in Practice � Search in Web 2.0 � Microformats and Mashups � Microformats and Mashups � DB/IR in Theory � Query Languages Query Languages � Retrieval Semantics � Evaluation à la DB (Query Processing) E l ti à l DB (Q P ssi ) � Evaluation à la DB (Relevance Assessments) � Challenges Ch ll VLDB 2007, Vienna, 26/09/2007 12

  13. Microformats Microformats � Community data formats � Personal Data: hCard (vCard) � Personal Data: hCard (vCard) � Calendar and Events: hCal (iCal) � Social Networking: XFN � Social Networking: XFN � Reviews: hReview � Licenses rel license � Licenses: rel-license � Folksonomies: rel-tag � Embedded in XHTML pages and RSS feeds � Embedded in XHTML pages and RSS feeds � Also RSS Extensions (iTunes, Yahoo! Media, Geo, Google Base, 20+ more in use) g ) VLDB 2007, Vienna, 26/09/2007 13

  14. Example: hCal Example: hCal <strong class="summary">Fashion Expo</strong> in <span class="location">Paris, France</span>: <abbr class="dtstart" title="2006-10-20">Oct 20</abbr> bb l "d " i l "2006 10 20" O 20 / bb to <abbr class="dtend" title="2006-10-23">22</abbr> � Large and growing list of websites � Large and growing list of websites � Eventful.com � LinkedIn � Yedda � upcoming.yahoo.com � Yahoo! Local, Yahoo! Tech Reviews � Benefit from shared tools, practices (hCalendar creator, iCal Extraction) iC l E i ) VLDB 2007, Vienna, 26/09/2007 14

  15. Semantic Mashups Semantic Mashups � A “semantic” mashup can � Contact (hCard) � Contact (hCard) � Friends (XFN,FOAF) � To attend a recommended event (hCal hReview) � To attend a recommended event (hCal,hReview) � Microformats are the lower-case semantic web web � Also Machine Tags (eg, flickr:user=me) � Tags that use a special syntax to define extra � Tags that use a special syntax to define extra information about a tag � Have a namespace, a predicate and a value Have a namespace, a pred cate and a value (sounds familiar?) VLDB 2007, Vienna, 26/09/2007 15

  16. Search in Mashup Creation Search in Mashup Creation VLDB 2007, Vienna, 26/09/2007 16

  17. Mashup Tools Mashup Tools � Microsoft Popfly � IBM ProjectZero � IBM ProjectZero � Yahoo! Pipes � Allows developers to mash-up web data All s d l s t sh b d t � drag and drop editor which enables user to connect multiple Internet data sources connect multiple Internet data sources � a source is grabbed and searched! � both content and structure are queried � both content and structure are queried VLDB 2007, Vienna, 26/09/2007 17

  18. Yahoo! Pi Y Y ahoo! Pipes Demo p p es Demo VLDB 2007, Vienna, 26/09/2007 18

  19. Yahoo! Pi Y Y ahoo! Pipes Demo p p es Demo VLDB 2007, Vienna, 26/09/2007 19

  20. Yahoo! Pipes Demo Result Yahoo! Pipes Demo Result VLDB 2007, Vienna, 26/09/2007 20

  21. Overview Overview � Preliminaries � Web in Practice � Web in Practice � Search in Web 2.0 � Microformats and Mashups � Microformats and Mashups � DB/IR in Theory � Retrieval Languages and Semantics Retrieval Languages and Semantics � Evaluation à la DB (Query Processing) � Evaluation à la DB (Relevance Assessments) E l ti à l DB (R l Ass ss ts) � Challenges VLDB 2007, Vienna, 26/09/2007 21

  22. Take Away Take Away � Search is crucial when accessing Web 2.0 sources sources � There is already demand for exploiting additional structure in Web 2 0 search additional structure in Web 2.0 search � Structure (XML) retrieval needs to: � be exposed to users/developers � be exposed to users/developers � support rich, context-dependent semantics � address efficiency and effectiveness � address efficiency and effectiveness VLDB 2007, Vienna, 26/09/2007 22

  23. Overview Overview � Preliminaries � Web in Practice � Web in Practice � DB/IR in Theory � Query Languages � Query Languages � Retrieval Semantics � Evaluation à la DB (Query Processing) Evaluation à la DB (Query Processing) � Evaluation à la DB (Relevance Assessments) � Challenges Ch ll s VLDB 2007, Vienna, 26/09/2007 23

  24. Languages Languages � Keyword search � “squat” “ t” � Tag + Keyword search � description: squat descr pt on squat � Path Expression + Keyword search � //image[./title about “squat”] � XQuery + Complex full-text search l f ll h � for $i in //image let score $s := $i ftscore “squat” && “toilet” $ $ q distance 2 VLDB 2007, Vienna, 26/09/2007 24

  25. Overview Overview � Preliminaries � Web in Practice � Web in Practice � DB/IR in Theory � Query Languages � Query Languages � Retrieval Semantics � Evaluation à la DB (Query Processing) Evaluation à la DB (Query Processing) � Evaluation à la DB (Relevance Assessments) � Challenges Ch ll s VLDB 2007, Vienna, 26/09/2007 25

  26. Retrieval Semantics Retrieval Semantics � Structure search incorporates conditions on the underlying structure of a collection the underlying structure of a collection � Schemas help � Schemas prescribe data and help validation � Schemas prescribe data and help validation � Provide limited description of valid instances � New semantics � Lowest Common Ancestor � Query relaxation � Overlapping elements pp g VLDB 2007, Vienna, 26/09/2007 26

  27. Lowest Common Ancestor Lowest Common Ancestor � Retrieve most relevant fragment � � References: � Nearest Concept Queries (Schmidt etal � Nearest Concept Queries (Schmidt etal, ICDE 2002) � XRank (Guo et al, SIGMOD 2003) XRank (Guo et al, SIGMOD 2003) � SchemaFree XQuery (Li et al VLDB 2004) � XKSearch (Xu & Papakonstantinou SIGMOD � XKSearch (Xu & Papakonstantinou, SIGMOD 2005) VLDB 2007, Vienna, 26/09/2007 27

Recommend


More recommend