what queries can the domain mediator answer for me
play

What queries can the Domain mediator answer for me? Developer - PowerPoint PPT Presentation

Large-Scale Data Integration Systems Exporting and Interactively Querying Application Domain CNET Computer Developer Web Service-Accessed Sources: PCWorld Portals Application Application The CLIDE System


  1. Large-Scale Data Integration Systems  Exporting and Interactively Querying   Application Domain • CNET Computer Developer Web Service-Accessed Sources: • PCWorld Portals Application Application The CLIDE System  Integration Compatible Combinations Domain Integrated Integration Mediator Schema of Computers, Routers Engineer and Printers Michalis Petropoulos Source • Dell Computers by CPU  Domain Web Web Web • Cisco Routers by Rate Service Service Service … • HP Printers by Speed Source Owner • Dell Computers Source Source Data Data • Cisco Routers … Schema Schema Source Source • HP Printers Database Seminar, February 2010 2 Large-Scale Data Integration Systems Running Example Parameterized Views  Application   What queries can the Domain mediator answer for me? Developer CLIDE Schema Schema Application Application Computers (cid, cpu, ram, price) Routers (rate, standard, price, type)  NetCards (cid, rate, standard, interface) Views Integration Views Wired Integrated Domain Computers Integration Routers Mediator V3 RouWired () → ( Router )* Schema for a given cpu Engineer V1 ComByCpu ( cpu ) → ( Computer )* SELECT DISTINCT Rou1.* SELECT DISTINCT Com1.* FROM Routers Rou1 FROM Computers Com1 WHERE Rou1.type= 'Wired' Wireless Source WHERE Com1.cpu= cpu Computers & NetCards  Routers Domain Web Web Web for a given cpu & rate V4 RouWireless () → ( Router )* V2 ComNetByCpuRate ( cpu , rate ) → Service Service Service … ( Computer , NetCard )* SELECT DISTINCT Rou1.* Source FROM Routers Rou1 Owner SELECT DISTINCT Com1.*, Net1.* WHERE Rou1.type= 'Wireless' FROM Computers Com1, Network Net1 Source Source Data Data WHERE Com1.cid=Net1.cid … Conjunctive Queries CQ Schema Schema Source Source AND Com1.cpu= cpu AND Net1.rate= rate • Equality & Comparison Conditions • Parameters 3 4 1

  2. Sophisticated Mediators Make Running Example Feasible Queries Hard to Predict Integrated Schema Feasible Queries FQ • Equivalent CQ query rewritings using the views  • Might involve more than one views  • Order might matter Developer Application Query: Query: Feasible Infeasible Get all Computers Get all ‘P4’ Computers , together with their NetCards • Integrated schema puts together and their compatible ‘Wireless’ Routers Mediator Integrated the Dell and Cisco schemas Schema Computers.* NetCards.* Routers.* E A123 P4 512 400 A123 10 .11b USB 10 .11b 50 Wireless B123 P4 1024 550 B123 54 .11g USB 54 .11g 120 Wireless Attribute Associations Routers.* Computers.* NetCards.* • (Computers.cid, NetCards.cid) Mediator B D 10 .11b 50 Wireless A123 P4 512 400 A123 10 .11b USB V1 V2 V3 V4 54 .11g 120 Wireless B123 P4 1024 550 B123 54 .11g USB • (NetCards.rate, Routers.rate) Mediator • (NetCards.standard, Routers.standard) A C RouWireless () ComNetByCpuRate (‘P4’, ‘10’) V1 Dell Cisco ComNetByCpuRate (‘P4’, ‘54’) V4 V2 5 6 Problem The CLIDE Solution 1. Large number of sources CLIDE  2. Large number of views (web-services)  3. Mediator capabilities Developer Application  A query formulation interface, Developer formulates an application query which interactively guides the  Is an application query feasible? Mediator Integrated developer toward feasible queries  If not, how do I know which ones are feasible? Schema by employing a coloring scheme Previous options: – The developer had to browse the view definitions and somehow formulate a feasible query V1 V2 V3 V4 – Or formulate queries until a feasible one is found (trial-and-error) Dell Cisco No system-provided guidance 7 8 2

  3. QBE-Like Interfaces CLIDE Interface Microsoft SQL-Server Last/Next Step Table Alias Selection Boxes Table Boxes Feasibility Flag Projection Box • Table, selection, projection and join actions • Feasibility Flag • Color-based suggestions 9 10 Example Interaction Example Interaction Snapshot 1 Snapshot 2 Yellow  required action Blue  required choice of action C – All feasible queries require this action – At least one feasible query cannot be formulated ram price Mediator 512 400 unless this action is performed 1024 550 White  optional action cid cpu ram price – Feasible queries can be formulated A ComByCpu (‘P4’) B A123 P4 512 400 w/ or w/o these actions B123 P4 1024 550 V1 11 12 3

  4. Example Interaction Example Interaction Snapshot 3 Snapshot 4 • *  any other constant Join Lines: • Red  prohibited action • Only yellow and blue are displayed – Does not appear in any feasible query • Must appear in Attribute Associations – Lead to “Dead End” state 13 14 Example Interaction Demo Snapshot 5 ram price rate interface price F 512 400 10 USB 50 1024 550 54 USB 120 Mediator Computers.* NetCards.* A D Routers.* A123 P4 512 400 A123 10 .11b 50 10 .11b 512 Wireless B123 P4 1024 550 B123 54 .11g 120 RouWireless () ComNetByCpuRate (‘P4’, rate ) 54 .11g 1024 Wireless E B V4 V2 15 16 4

  5. CLIDE Properties Interaction Graph Selection Table Join Action Action Action • Completeness of Suggestions Com1 Com1.ram Com1.price Com1.cpu=‘P4’ Net1 Com1.cid=Net1.cid Rou1 … … … … … … … … … … – Every feasible query can be formulated by performing yellow and blue actions at every step • Summarization of Suggestions – At every step, only a minimal number of actions is suggested, i.e., the ones that are needed to preserve completeness • Rapid Convergence By Following • Nodes are queries: One for each q ∈ CQ Suggestions • Edges are actions: Table, selection, projection and join actions – The shortest sequence of actions from a query to • Green nodes are feasible queries any feasible query consists of suggested actions • Infinitely big structure – All CQ queries – All possible combinations of actions formulating them 17 18 Interaction Graph: Colorable Actions Interaction Graph: Colors • Yellow action α – Every path from current node n to a feasible node contains α • Blue action α – At least one feasible query cannot be formulated unless this action is performed (summarization) Com1.cid Current Node … … Com1.cpu • Red action α Current Com1.cid – No path to a feasible node contains α Node Com1.cpu … Com1.cid=* … … Com1.cpu=* Current Com1.cid=* Node … Com1.ram=* … … Com1.cpu=* Com1.cpu=* Net1 Com1.cid=Net1.cid Com1.cid=Net1.cid Net1.rate=’54Mbps’ … Com1.price=* … … … … Com1.ram=* • Colorable actions A C label … Com1.cid=Net1.cid … … Net1.rate=’54Mbps’ Net1.rate=’54Mbps’ Com1.price=* outgoing edges of the current node Net1 … … … … Rou1 Com1.cpu=* Rou1 Rou1 Com1.cid=Net1.cid Net1.rate=Rou1.rate Net1 … … Com2 … … … … … Rou1 Com2 Com2 … … Com2 Com2.cid=Net1.cid Com2.cpu=‘P4’ Net1.rate=‘54Mbps’ … … … … … 19 20 5

  6. Color Determined CLIDE Architecture By a Finite Set of Feasible Queries  Challenge: Infinitely Many Feasible Queries Actions Front-End … ? User Current Query Colored Actions + Feasibility Flag s u d i a R … Back-End Color Algorithm Closest … n Feasible … Seed Queries SQ Queries FQ C … Parameters Algorithm … Closest Feasible Queries FQ C … Closest Feasible Queries Algorithm Solution: Closest Feasible Queries FQ C Aliases Collapse Rule Minimal Feasible • FQ C is sufficient to color actions in A C Extension Queries Maximally-Contained Rewriter • Theorem: Set of Closest Feasible Queries is Finite Column Schemas Views Associations Challenge: How far can the Closest Feasible Queries FQ C be? • Back-End invoked every time the user performs an action Solution: Based on Maximally Contained Queries FQ MC – i.e., the user arrives at a new node in the interactions graph 21 22 Maximally Contained Queries FQ MC Closest Feasible Queries FQ C Algorithm Challenge: How far can the Closest Feasible Queries FQ C be? Solution: Maximally Contained Queries FQ MC Maximally Contained Query Query: Q2 s Get all Computers i u d a R Query: Q1 p … with a given cpu L Maximally Get all Computers Contained … n Closest … Queries FQ MC Maximally Contained Query Not Maximally Contained Feasible … Query: Q4 Query: Q3 Queries FQ C … Get all Computers Get all Computers … with a given ram with a given cpu & ram … • Compute maximally contained queries FQ MC • Assuming fixed SELECT clause (projection list) • Theorem: All FQ C queries are reachable • Covered extensively in literature via a path of length p ≤ p L – MiniCon, Bucket, InverseRules Algorithms • The radius p L is the longest path to a maximally contained • FQ MC is finite query 23 24 6

Recommend


More recommend