Talk Outline Core Computation for Data Exchange 1. Preliminaries Vadim Savenkov 2. Computing the core Vienna University of Technology DEIS 2010 November 9, 2010 Preliminaries: Labeled nulls and homomorphisms Embedded implicational dependencies Consider a database model based on v-relations: unknown values Tuple-generating dependencies are labeled, and the same label can have several occurrences in a database, unlike the usual SQL nulls (“Codd” tables). ◮ Employee ( Name , Project , Salary ) → dom ( J ) = const ( J ) ∪ var ( J ) ∃ Id ∃ Dep ( Staff ( Id , Name , Dep ) ∧ Wage ( Id , Salary )) const ( J ) ∩ var ( J ) = ∅ J ◮ Source-to-target (st) tgds: How the data must be transferred. ◮ Target tgds: generalize inclusion / join dependencies. A basic data exchange framework. ◮ Naive chase: ∀� Name , Salary � add the instantiation of the Σ st conclusion atoms to the db. Replace existential variables by fresh Σ t I J distinct labeled nulls. contains labeled nulls no nulls Equality-generating dependencies Definition ◮ Staff ( Id , Name 1 , Dep 1 ) ∧ Staff ( Id , Name 2 , Dep 2 ) → Dep 1 = Dep 2 A homomorphism h between two instances I and J maps dom ( I ) on dom ( J ) such that ∀ c ∈ const ( I ) h ( c ) = c , and whenever ◮ Generalize functional dependencies. R (¯ x ) ∈ I it holds that R ( h (¯ x )) ∈ J . Cores and endomorphisms Chase delivers a canonical universal solution. Fundamental paper “Core of a graph” by Hell and Nesetril [1992] Example ◮ Cores of any relational structure are isomorphic ⇒ “the core” τ 1 st : BasicUnit ( C ) → Course ( Idc , C ) . ◮ Homomorphically equivalent structures have isomorphic cores. τ 2 st : Tutorial ( C , T ) → Course ( Idc , C ) , Tutor ( Idt , T ) , Teaches ( Idt , Itc ) . • Contrast with: typically, there is infinitely many universal BasicUnit(’C#’) ⇒ Course( C 1 , ’C#’) solutions for each source instance. (Just add tuples of distinct Tutorial(’C#’, ’Joe’) ⇒ Course( C 2 , ’C#’), Tutor( T 1 , ’Joe’), Teaches( T 1 , C 2 ) fresh labeled nulls.) All universal solutions are hom. equivalent. Formalizing “redundancy” • Thus, a single core captures the whole infinite set USol ( I , M ) . Endomorphism is a homomorphism from an instance onto itself. If an Bet endomorphism maps an instance onto its proper subset, it is called proper Let Σ be set of tgds and egds, J be an instance satisfying Σ and J ′ endomorphism. Nulls that can be eliminated by proper endomorphisms an endomorphic image of J . Does it hold that J ′ | are redundnant. = Σ ? Definition Consider Σ = { R ( u , w ) , R ( w , w ) , R ( w , v ) → R ( u , v ) } and Let J be an instance. Core of J (denoted core ( J ) ) is an endomorphic J = { ( x , z ) , ( x , a ) , ( z , y ) , ( a , z ) , ( a , a ) } . Let h = { z → a , y → z } be image of J , for which no proper endomorphism exists. endomorphism, then h ( J ) = { ( x , a ) , ( a , z ) , ( a , a ) } �| = Σ holds. However, core ( J ) = { ( x , a ) , ( a , a ) } | = Σ . Cores and embedded dependencies Definition Idempotent endomorphism, i.e. r such that r ( r ( x )) = r ( x ) , for all x is Property ([Hell and Nesetril, 1992]) called a retraction. Any endomorphism can be transformed into a retraction simply by iterating it long enough. Let A be a relational structure and C its core. Then, there exists a homomorphism h : A → C , such that for all v ∈ dom ( C ) , As we just showed, core of a structure is a retract. h ( v ) = v . Theorem (Fagin, Kolaitis, and Popa [2005b]) Consider a homomorphism r : A → C . Let M = ( S , T , Σ st ∪ Σ t ) be a mapping where Σ st is a set of Restricted to dom ( C ) , r is one-to-one source-to-target tgds, and Σ t consists of target tgds and egds. Then, if (otherwise, C would not be a core). J ∈ Sol ( I , M ) , and J ′ is a retract of J, then also J ′ ∈ Sol ( I , M ) . w w Let G r be a graph whose vertices are v v elements of dom ( C ) , and an edge Proof (Excerpt). ( x , y ) denotes r ( x ) = y . Every edge of y ) in Σ t . To show: J ′ | Consider a target tgd τ : φ (¯ x ) → ( ∃ ¯ y ) ψ (¯ x , ¯ = τ . y y z z such graph belongs to a cycle. For a , J ′ | = τ , ∃ ¯ Assume that for some ¯ = φ (¯ a ) . Then, by J | b ∈ dom ( J ) such cycle of length n , vertices that occur in a , ¯ b ) . J ′ being a retract, means there exists h : J → J ′ such that J | = ψ (¯ it are mapped to themselves by r n . x s x s that ∀ v ∈ var ( J ′ ) h ( v ) = v . Moreover, r n is still a homomorphism Hence, J ′ | a ) , h (¯ a , we have J ′ | a , h (¯ = ψ ( h (¯ b )) . Since h (¯ a ) = ¯ = ψ (¯ b )) and and thus must be one-to-one on C . thus, also J ′ | = τ . Now, consider the graph G r n , etc.
Timeline Core Computation as a Postprocessing Step 2003 “Getting to the core” paper by Fagin, Kolaitis, and Popa at PODS (TODS version: 2005) . Introduced cores in the context of data First chase, then reduce exchange. ST tgds + target egds. 2005 In his PODS paper, Gottlob addresses full target tgds (very tricky!). 1. chase Σ st J 2006 “Computing cores in polynomial time” paper by Gottlob and Nash 2. chase Σ t I (JACM version: 2010) Weakly-acyclic sets of target tgds + egds (simulated by full tgds). 3. reduce 2008 Pichler and S. add direct support for target egds along with weakly acyclic sets of tgds. (LPAR, TCS version: 2010) 2009 (i) SIGMOD paper by Mecca, Papotti and Raunich, and “Laconic core ( J ) Schema Mappings“ @ VLDB by ten Cate, Chiticariu, Kolaitis, and Tan. Computing cores directly, as part of the chase; no target + Most general approach (handles also target constraints) constraints. (ii) PODS paper by Marnette presents a robust core-based semantics for data exchange. - Performance 2010 Marnette, Mecca and Papotti consider direct core computation under target functional dependencies. (VLDB) . Greedy algorithm [Fagin et al., 2005b], target egds Descent to the core via proper retractions Input: Source instance I , st tgds Σ st , target egds Σ t ◮ As we have shown, a retract of a solution is itself a solution. Output: A core of a universal solution for I under Σ st ∪ Σ t ◮ Moreover, the core of a structure is unique (up to (1) Chase I with Σ st ⇒ Canonical pre-universal instance ˜ J . isomorhpism). (2) Chase ˜ J with Σ t ⇒ Compute an ever shrinking sequence of proper retractions: If the chase fails ⇒ stop and return “failure”; J , r 1 ( J ) , r 2 ( r 1 ( J )) , ... otherwise, let J be a canonical universal solution. Retracts are solutions, so no need to test � I , r n ( J ) � | = Σ (3) Initialize J ∗ to be J . ◮ How to find a proper retraction? Iterate a proper x ) ∈ J ∗ such that (4) While there is a fact R (¯ endomorphism. � I , J ∗ − { R (¯ = Σ st , set J ∗ to be J ∗ − { R (¯ x ) }� | x ) } . ◮ How to find a proper endomorphism? For general structures, we are likely to need exhaustive search. (5) Return J ∗ . • CoreIdentification is DP-complete [Fagin et al., 2005b] Question • CoreRecognition is coNP-complete [Fagin et al., 2005b] As is, works only with target egds. Why? ◮ What about solutions in data exchange? - source instance has to be available Blocks algorithm: idea Blocks algorithm: idea (2) Key idea Each homomorphism h : J → K can be represented as a union of Blocks are mutually independent partitions of var ( J ) . h B i : J [ B i ] → K for blocks B i of J . Recall how the canonical universal solution is created during the Gaifman Graph G J of instnance J chase of the source instance I : Undirected graph ( V , E ) where V represents var ( J ) and ( v 1 , v 2 ) ∈ E whenever there is R (¯ v ) ∈ J such that v 1 , v 2 ∈ ¯ v . – For each st tgd φ (¯ x ) → ( ∃ ¯ y ) ψ (¯ x , ¯ y ) Blocks correspond to connected components of G J . For each ¯ a , such that I | = φ (¯ a ) , ψ (¯ a , ¯ y ) is instantiated by replacing the elements of ¯ y with fresh labeled nulls. Example Question R ( x , y ) , R ( y , z ) , R ( v , w ) R ( 1 , 2 ) , R ( 2 , 3 ) , R ( 4 , 5 ) If Σ t = ∅ and J was created by chasing Σ = Σ st . What can be said about the block size of J ? Blocks algorithm: no target constraints Blocks algorithm: target egds A nice property allows to lift the blocks algorithm to target egds. Input: Source instance I , mapping Σ st Rigidity Lemma [Fagin et al., 2005b] Output: A core of a universal solution for I under Σ st Let ˜ J be the canonical preuniversal instance for some source I and (1) Chase I with Σ st ⇒ Canonical universal solution J . mapping Σ st ∪ Σ t where Σ t consists of egds. Moreover, let x and (2) Compute the blocks B i of J , and initialize J ′ to be J y be nulls from different blocks of ˜ J . If, in the course of the chase of ˜ J with Σ t , an equality x = y is enforced, the term [ x ](= [ y ]) (3) Check if h i : J ′ [ B i ] → J ′ exists, s.t. h ( x ) = h ( y ) for some standing for both x and y in the canonical universal solution J , is x ∈ B i and y � = x . rigid: any endomorphism of J maps [ x ] on itself. (4) Set J ′ = h ( J ′ ) , where h extends h i to dom ( I ) as identity Example mapping J = { R ( 1 , x ) , R ( y , 2 ) , R ( 1 , 3 ) , R ( 3 , 2 ) } (5) Return to step (3). Σ t = { R ( 1 , x ) , R ( y , 2 ) → x = y } Effectively, target egds can be simply ignored.
Recommend
More recommend