BANKS BANKS Browsing rowsing an and d K Keyword eyword S Search earch B in Relational Databases in Relational Databases B. Aditya, Gaurav Bhalotia, B. Aditya, Gaurav Bhalotia, Soumen Chakrabarti, Soumen Chakrabarti, Arvind Hulgeri, Charuta Nakhe, Arvind Hulgeri, Charuta Nakhe, Parag, S. Sudarshan Parag, S. Sudarshan IIT Bombay IIT Bombay http://www.cse.iitb.ac.in/banks/ http://www.cse.iitb.ac.in/banks/
Motivation Motivation § Web search engines are very successful Web search engines are very successful § • Simple and intuitive keyword query interface Simple and intuitive keyword query interface • § Database querying using keywords is desirable Database querying using keywords is desirable § • Query languages, e.g.,SQL/QBE, are not appropriate Query languages, e.g.,SQL/QBE, are not appropriate • for casual users for casual users • Form interfaces cumbersome, give limited views Form interfaces cumbersome, give limited views • § Examples of keyword queries on databases Examples of keyword queries on databases § • e e- -store database: “camcorder panasonic” store database: “camcorder panasonic” • • Book store: “sudarshan databases” Book store: “sudarshan databases” • § Differences from IR/Web Search Differences from IR/Web Search § • Normalization splits related data across multiple tuples Normalization splits related data across multiple tuples • • Answer to a query is a set of (closely) connected Answer to a query is a set of (closely) connected • tuples that match all given keywords tuples that match all given keywords Aug 2002 VLDB 2002 DEMO 2
Basic Model Basic Model § Database: modeled as a graph Database: modeled as a graph § • Nodes = tuples Nodes = tuples • • Edges = references between tuples Edges = references between tuples • w foreign key, inclusion dependencies, etc. foreign key, inclusion dependencies, etc. w w Edges are directed Edges are directed w BANKS: Keyword search… MultiQuery Optimization paper writes Charuta S. Sudarshan Prasan Roy author Aug 2002 VLDB 2002 DEMO 3
Answer Model Answer Model § Rooted, directed tree connecting keyword nodes Rooted, directed tree connecting keyword nodes § • May include internal nodes that contain no keywords May include internal nodes that contain no keywords • • Root node has special significance Root node has special significance • w May be restricted to relations representing entities May be restricted to relations representing entities w w Avoid relations representing relationships, e.g. “writes” Avoid relations representing relationships, e.g. “writes” w § An example: “sudarshan roy” An example: “sudarshan roy” § paper MultiQuery Optimization writes writes S. Sudarshan Prasan Roy author author § Multiple answers may exist Multiple answers may exist § • Ranked by • Ranked by proximity proximity + + prestige prestige Aug 2002 VLDB 2002 DEMO 4
Relevance Calculation Relevance Calculation § Proximity Proximity § • Forward edges: foreign key Forward edges: foreign key primary key • primary key • Weight of forward edge is based on schema Weight of forward edge is based on schema • w E.g. “cites” link weight greater than “writes” link weight E.g. “cites” link weight greater than “writes” link weight w • May need backward edges to form answer tree • May need backward edges to form answer tree v ∝ ∝ indegree of u w Weight of backward edge u Weight of backward edge u indegree of u v w § Node prestige based on indegree Node prestige based on indegree § § Answer tree relevance Answer tree relevance § 1 / Σ Σ edge • Edge score Edge score E E = = 1 / edge- -weights weights • = Σ Σ root • Node score Node score N N = root- - and leaf and leaf- -node node- -weights weights • w Ignore weights of internal nodes Ignore weights of internal nodes w Normalize and combine using weighting factor λ λ • Normalize and combine using weighting factor • λ - λ λ ) + λ λ N; multiplicative: EN λ w Additive: Additive: (1 (1- ) E E + N; multiplicative: EN w Aug 2002 VLDB 2002 DEMO 5
Answer Trees Answer Trees § Anecdotal results Anecdotal results § “Mohan”: C. Mohan at the top based on prestige (# of papers) : C. Mohan at the top based on prestige (# of papers) “Mohan” “Transaction”: Jim Gray’s classic paper and textbook at the top “Transaction” : Jim Gray’s classic paper and textbook at the top based on prestige (# of citations) based on prestige (# of citations) “Sunita Seltzer”: No common papers, but both have papers with : No common papers, but both have papers with “Sunita Seltzer” Stonebraker; system finds this connection Stonebraker; system finds this connection § Backward expanding search algorithm Backward expanding search algorithm § • Start at leaf nodes each containing a query keyword • Start at leaf nodes each containing a query keyword • Run concurrent single source shortest path algorithm from each Run concurrent single source shortest path algorithm from each • such node, traversing edges backwards such node, traversing edges backwards • Confluence of backward paths identify answer tree roots Confluence of backward paths identify answer tree roots • • Answer trees may not be generated in relevance order Answer trees may not be generated in relevance order • w Insert answers to a small buffer (heap) as are generated Insert answers to a small buffer (heap) as are generated w w Output highest ranked answer from buffer when buffer is full Output highest ranked answer from buffer when buffer is full w Aug 2002 VLDB 2002 DEMO 6
The BANKS System The BANKS System HTTP JDBC User BANKS Database Web Server+Servlets § Available on the web, with (part of) DBLP data Available on the web, with (part of) DBLP data § • http://www.cse.iitb.ac.in/banks/ http://www.cse.iitb.ac.in/banks/ • § No programming needed for customization No programming needed for customization § • Minimal preprocessing to create indices and give weights to link Minimal preprocessing to create indices and give weights to links s • § Provides keyword search coupled with extensive Provides keyword search coupled with extensive § browsing features browsing features • Schema browsing + data browsing Schema browsing + data browsing • • Hyperlinks are automatically added to all displayed results Hyperlinks are automatically added to all displayed results • • Browsing data by grouping and creating crosstabs • Browsing data by grouping and creating crosstabs • Graphical display of data: bar charts, pie charts, etc Graphical display of data: bar charts, pie charts, etc • Aug 2002 VLDB 2002 DEMO 7
Recommend
More recommend