talk outline core computation for data exchange
play

Talk Outline Core Computation for Data Exchange 1. Preliminaries - PDF document

Talk Outline Core Computation for Data Exchange 1. Preliminaries Vadim Savenkov 2. Computing the core Vienna University of Technology DEIS 2010 November 9, 2010 Preliminaries: Labeled nulls and homomorphisms Embedded implicational


  1. Talk Outline Core Computation for Data Exchange 1. Preliminaries Vadim Savenkov 2. Computing the core Vienna University of Technology DEIS 2010 November 9, 2010 Preliminaries: Labeled nulls and homomorphisms Embedded implicational dependencies Consider a database model based on v-relations: unknown values Tuple-generating dependencies are labeled, and the same label can have several occurrences in a database, unlike the usual SQL nulls (“Codd” tables). ◮ Employee ( Name , Project , Salary ) → dom ( J ) = const ( J ) ∪ var ( J ) ∃ Id ∃ Dep ( Staff ( Id , Name , Dep ) ∧ Wage ( Id , Salary )) const ( J ) ∩ var ( J ) = ∅ J ◮ Source-to-target (st) tgds: How the data must be transferred. ◮ Target tgds: generalize inclusion / join dependencies. A basic data exchange framework. ◮ Naive chase: ∀� Name , Salary � add the instantiation of the Σ st conclusion atoms to the db. Replace existential variables by fresh Σ t I J distinct labeled nulls. contains labeled nulls no nulls Equality-generating dependencies Definition ◮ Staff ( Id , Name 1 , Dep 1 ) ∧ Staff ( Id , Name 2 , Dep 2 ) → Dep 1 = Dep 2 A homomorphism h between two instances I and J maps dom ( I ) on dom ( J ) such that ∀ c ∈ const ( I ) h ( c ) = c , and whenever ◮ Generalize functional dependencies. R (¯ x ) ∈ I it holds that R ( h (¯ x )) ∈ J . Cores and endomorphisms Chase delivers a canonical universal solution. Fundamental paper “Core of a graph” by Hell and Nesetril [1992] Example ◮ Cores of any relational structure are isomorphic ⇒ “the core” τ 1 st : BasicUnit ( C ) → Course ( Idc , C ) . ◮ Homomorphically equivalent structures have isomorphic cores. τ 2 st : Tutorial ( C , T ) → Course ( Idc , C ) , Tutor ( Idt , T ) , Teaches ( Idt , Itc ) . • Contrast with: typically, there is infinitely many universal BasicUnit(’C#’) ⇒ Course( C 1 , ’C#’) solutions for each source instance. (Just add tuples of distinct Tutorial(’C#’, ’Joe’) ⇒ Course( C 2 , ’C#’), Tutor( T 1 , ’Joe’), Teaches( T 1 , C 2 ) fresh labeled nulls.) All universal solutions are hom. equivalent. Formalizing “redundancy” • Thus, a single core captures the whole infinite set USol ( I , M ) . Endomorphism is a homomorphism from an instance onto itself. If an Bet endomorphism maps an instance onto its proper subset, it is called proper Let Σ be set of tgds and egds, J be an instance satisfying Σ and J ′ endomorphism. Nulls that can be eliminated by proper endomorphisms an endomorphic image of J . Does it hold that J ′ | are redundnant. = Σ ? Definition Consider Σ = { R ( u , w ) , R ( w , w ) , R ( w , v ) → R ( u , v ) } and Let J be an instance. Core of J (denoted core ( J ) ) is an endomorphic J = { ( x , z ) , ( x , a ) , ( z , y ) , ( a , z ) , ( a , a ) } . Let h = { z → a , y → z } be image of J , for which no proper endomorphism exists. endomorphism, then h ( J ) = { ( x , a ) , ( a , z ) , ( a , a ) } �| = Σ holds. However, core ( J ) = { ( x , a ) , ( a , a ) } | = Σ . Cores and embedded dependencies Definition Idempotent endomorphism, i.e. r such that r ( r ( x )) = r ( x ) , for all x is Property ([Hell and Nesetril, 1992]) called a retraction. Any endomorphism can be transformed into a retraction simply by iterating it long enough. Let A be a relational structure and C its core. Then, there exists a homomorphism h : A → C , such that for all v ∈ dom ( C ) , As we just showed, core of a structure is a retract. h ( v ) = v . Theorem (Fagin, Kolaitis, and Popa [2005b]) Consider a homomorphism r : A → C . Let M = ( S , T , Σ st ∪ Σ t ) be a mapping where Σ st is a set of Restricted to dom ( C ) , r is one-to-one source-to-target tgds, and Σ t consists of target tgds and egds. Then, if (otherwise, C would not be a core). J ∈ Sol ( I , M ) , and J ′ is a retract of J, then also J ′ ∈ Sol ( I , M ) . w w Let G r be a graph whose vertices are v v elements of dom ( C ) , and an edge Proof (Excerpt). ( x , y ) denotes r ( x ) = y . Every edge of y ) in Σ t . To show: J ′ | Consider a target tgd τ : φ (¯ x ) → ( ∃ ¯ y ) ψ (¯ x , ¯ = τ . y y z z such graph belongs to a cycle. For a , J ′ | = τ , ∃ ¯ Assume that for some ¯ = φ (¯ a ) . Then, by J | b ∈ dom ( J ) such cycle of length n , vertices that occur in a , ¯ b ) . J ′ being a retract, means there exists h : J → J ′ such that J | = ψ (¯ it are mapped to themselves by r n . x s x s that ∀ v ∈ var ( J ′ ) h ( v ) = v . Moreover, r n is still a homomorphism Hence, J ′ | a ) , h (¯ a , we have J ′ | a , h (¯ = ψ ( h (¯ b )) . Since h (¯ a ) = ¯ = ψ (¯ b )) and and thus must be one-to-one on C . thus, also J ′ | = τ . Now, consider the graph G r n , etc.

  2. Timeline Core Computation as a Postprocessing Step 2003 “Getting to the core” paper by Fagin, Kolaitis, and Popa at PODS (TODS version: 2005) . Introduced cores in the context of data First chase, then reduce exchange. ST tgds + target egds. 2005 In his PODS paper, Gottlob addresses full target tgds (very tricky!). 1. chase Σ st J 2006 “Computing cores in polynomial time” paper by Gottlob and Nash 2. chase Σ t I (JACM version: 2010) Weakly-acyclic sets of target tgds + egds (simulated by full tgds). 3. reduce 2008 Pichler and S. add direct support for target egds along with weakly acyclic sets of tgds. (LPAR, TCS version: 2010) 2009 (i) SIGMOD paper by Mecca, Papotti and Raunich, and “Laconic core ( J ) Schema Mappings“ @ VLDB by ten Cate, Chiticariu, Kolaitis, and Tan. Computing cores directly, as part of the chase; no target + Most general approach (handles also target constraints) constraints. (ii) PODS paper by Marnette presents a robust core-based semantics for data exchange. - Performance 2010 Marnette, Mecca and Papotti consider direct core computation under target functional dependencies. (VLDB) . Greedy algorithm [Fagin et al., 2005b], target egds Descent to the core via proper retractions Input: Source instance I , st tgds Σ st , target egds Σ t ◮ As we have shown, a retract of a solution is itself a solution. Output: A core of a universal solution for I under Σ st ∪ Σ t ◮ Moreover, the core of a structure is unique (up to (1) Chase I with Σ st ⇒ Canonical pre-universal instance ˜ J . isomorhpism). (2) Chase ˜ J with Σ t ⇒ Compute an ever shrinking sequence of proper retractions: If the chase fails ⇒ stop and return “failure”; J , r 1 ( J ) , r 2 ( r 1 ( J )) , ... otherwise, let J be a canonical universal solution. Retracts are solutions, so no need to test � I , r n ( J ) � | = Σ (3) Initialize J ∗ to be J . ◮ How to find a proper retraction? Iterate a proper x ) ∈ J ∗ such that (4) While there is a fact R (¯ endomorphism. � I , J ∗ − { R (¯ = Σ st , set J ∗ to be J ∗ − { R (¯ x ) }� | x ) } . ◮ How to find a proper endomorphism? For general structures, we are likely to need exhaustive search. (5) Return J ∗ . • CoreIdentification is DP-complete [Fagin et al., 2005b] Question • CoreRecognition is coNP-complete [Fagin et al., 2005b] As is, works only with target egds. Why? ◮ What about solutions in data exchange? - source instance has to be available Blocks algorithm: idea Blocks algorithm: idea (2) Key idea Each homomorphism h : J → K can be represented as a union of Blocks are mutually independent partitions of var ( J ) . h B i : J [ B i ] → K for blocks B i of J . Recall how the canonical universal solution is created during the Gaifman Graph G J of instnance J chase of the source instance I : Undirected graph ( V , E ) where V represents var ( J ) and ( v 1 , v 2 ) ∈ E whenever there is R (¯ v ) ∈ J such that v 1 , v 2 ∈ ¯ v . – For each st tgd φ (¯ x ) → ( ∃ ¯ y ) ψ (¯ x , ¯ y ) Blocks correspond to connected components of G J . For each ¯ a , such that I | = φ (¯ a ) , ψ (¯ a , ¯ y ) is instantiated by replacing the elements of ¯ y with fresh labeled nulls. Example Question R ( x , y ) , R ( y , z ) , R ( v , w ) R ( 1 , 2 ) , R ( 2 , 3 ) , R ( 4 , 5 ) If Σ t = ∅ and J was created by chasing Σ = Σ st . What can be said about the block size of J ? Blocks algorithm: no target constraints Blocks algorithm: target egds A nice property allows to lift the blocks algorithm to target egds. Input: Source instance I , mapping Σ st Rigidity Lemma [Fagin et al., 2005b] Output: A core of a universal solution for I under Σ st Let ˜ J be the canonical preuniversal instance for some source I and (1) Chase I with Σ st ⇒ Canonical universal solution J . mapping Σ st ∪ Σ t where Σ t consists of egds. Moreover, let x and (2) Compute the blocks B i of J , and initialize J ′ to be J y be nulls from different blocks of ˜ J . If, in the course of the chase of ˜ J with Σ t , an equality x = y is enforced, the term [ x ](= [ y ]) (3) Check if h i : J ′ [ B i ] → J ′ exists, s.t. h ( x ) = h ( y ) for some standing for both x and y in the canonical universal solution J , is x ∈ B i and y � = x . rigid: any endomorphism of J maps [ x ] on itself. (4) Set J ′ = h ( J ′ ) , where h extends h i to dom ( I ) as identity Example mapping J = { R ( 1 , x ) , R ( y , 2 ) , R ( 1 , 3 ) , R ( 3 , 2 ) } (5) Return to step (3). Σ t = { R ( 1 , x ) , R ( y , 2 ) → x = y } Effectively, target egds can be simply ignored.

Recommend


More recommend