Towards a Formal Model for View Maintenance in Data Warehouses D. Agrawal, A. El Abbadi, A. Most´ efaoui, M. Raynal and M. Roy Towards a Formal Modelfor View Maintenance in Data Warehouses – p.1/22
Summary The Data Warehouse Problem Definitions Existing protocols A Formal Definition of the Problem Formal Definition of Data Objects Abstract Definition of View Management The Protocol A Virtual Topology A Pipelining Technique Towards a Formal Modelfor View Maintenance in Data Warehouses – p.2/22
The Data Warehouse Problem A set of databases x 1 , x 2 , · · · , x n How to efficiently query a database aggregate? x1 x2 x3 x4 x5 Query Towards a Formal Modelfor View Maintenance in Data Warehouses – p.3/22
The Data Warehouse Problem A set of databases x 1 , x 2 , · · · , x n How to efficiently query a database aggregate? By adding a Data Warehouse x1 x2 x3 x4 x5 Data Query Warehouse Towards a Formal Modelfor View Maintenance in Data Warehouses – p.3/22
Data Warehouse: Definition The Data Warehouse maintains a DB summary a Select-Project-Join (SPJ) expression: F ( X 1 , · · · , X n ) = Π A ( σ C ( X 1 ⊲ ⊳ X n )) ⊳ · · · ⊲ Data Warehouse (DWH) problem ≡ calculus of a “Simple” distributed function with changing Data Sources. Towards a Formal Modelfor View Maintenance in Data Warehouses – p.4/22
Extremal Solutions The DWH maintains the total aggregation of all Data Sources. costly in space unnecessary network usage The DWH stores no datum, and forwards queries to Data Sources high latency unnecessary network usage then, the DWH is just a proxy Towards a Formal Modelfor View Maintenance in Data Warehouses – p.5/22
Proposed Solutions The DWH maintains the SPJ expression F Periodically, it calculates the ∆ F Major Problem: asynchrony of updates on Data Sources Error Terms Towards a Formal Modelfor View Maintenance in Data Warehouses – p.6/22
Major Difficulties Asynchrony and distribution of the model: Consistency issues Performance issues network usage memory/disk usage on dwh. Complexity of proposed protocols: unproved algorithms need for a formal definition of the problem. Towards a Formal Modelfor View Maintenance in Data Warehouses – p.7/22
Formal Definitions (data) Data Objects denoted x i a data manager is associated with each x i can be updated and read using the query/update primitives Timeline: the successive values of x i are denoted ( x [ t ] i ) t> 0 . Towards a Formal Modelfor View Maintenance in Data Warehouses – p.8/22
Formal Definitions (operations) Data Operations add/remove, denoted ⊕ , for source updates associative commutative. a join operation, denoted ⊗ associative, commutative, distributive over ⊕ . Towards a Formal Modelfor View Maintenance in Data Warehouses – p.9/22
Formal Definitions (dwh) the Data Warehouse calculates F such that F = x 1 ⊗ x 2 ⊗ · · · ⊗ x n consistency is mandatory at any time. up-to-dateness is eventual for performance reasons Towards a Formal Modelfor View Maintenance in Data Warehouses – p.10/22
Abstract Def. of View Management A View Management protocol should satisfy: Validity any query on the dwh returns an f = x [ t 1 ] ⊗ · · · x [ t n ] n . 1 Order Consistency If q 1 = x [ t 1 1 ] ⊗ · · · x [ t 1 n ] (resp 1 n q 2 = x [ t 2 1 ] ⊗ · · · x [ t 2 n ] ) is the result of a query, if 1 n q 1 was issued before q 2 , then ∀ i, t 1 i ≤ t 2 i . Up-to-Dateness for any t > 0 , for any i ∈ [1 ..n ] , an infinite sequence of queries will return at least an f = F ( · · · , x [ t ′ ] i , · · · ) with t ′ ≥ t . Towards a Formal Modelfor View Maintenance in Data Warehouses – p.11/22
The Protocol: a single update Suppose that F = x 1 ⊗ x 2 ⊗ x 3 ⊗ x 4 . if x 1 is updated to x 1 ⊕ δ 1 , then the corresponding ∆ F is: ∆ F = δ 1 ⊗ x 2 ⊗ x 3 ⊗ x 4 x 1 ’s data manager sends δ 1 to x 2 : x 2 ’s data manager computes δ 1 ⊗ x 2 and sends the result to x 3 x 3 ’s data manager computes δ 1 ⊗ x 2 ⊗ x 3 when x 4 ’s data manager computes δ 1 ⊗ x 2 ⊗ x 3 ⊗ x 4 , it can send the result to the dwh Towards a Formal Modelfor View Maintenance in Data Warehouses – p.12/22
The Protocol: Concurrent Updates Now, suppose that both x 1 and x 2 are updated. F ′ = ( x 1 ⊕ δ 1 ) ⊗ ( x 2 ⊕ δ 2 ) ⊗ x 3 ⊗ x 4 F ′ = F ⊕ ∆ F ∆ F = ( δ 1 ⊗ x 2 ⊗ x 3 ⊗ x 4 ) ⊕ ( x 1 ⊗ δ 2 ⊗ x 3 ⊗ x 4 ) ⊕ ( δ 1 ⊗ δ 2 ⊗ x 3 ⊗ x 4 ) complexity increases with concurrency two solutions: 1. compute error terms 2. order the updates Towards a Formal Modelfor View Maintenance in Data Warehouses – p.13/22
The Protocol: a Virtual Topology the star topology (center: dwh, edges: nodes) is seen as a ring a token perpetually moves on the ring it generates a natural order on updates Towards a Formal Modelfor View Maintenance in Data Warehouses – p.14/22
The Protocol: Pipelining Updates The token generates a global time ( # of steps) the sites maintain an additional variable, the difference δ i between the current x i and the last commited x i . when an update made a total rotation, it can be integrated to the data warehouse. the token can contain up to n updates in commitment phase. Towards a Formal Modelfor View Maintenance in Data Warehouses – p.15/22
The Protocol: Code (1) when the token arrives to x i with sequence number sn : 1. let ∆ F = token [ i ] ; 2. if ( ∆ F � = ⊥ ) then sn ← sn + 1 ; send incr (∆ F , sn ) to DWH endif ; 3. token [ i ] ← ∆ i ; 4. ∀ j � = i do token [ j ] ← ( token [ j ] ⊗ ( x i ⊖ ∆ i )) enddo ; 5. ∆ i ← ⊥ ; 6. send token sn ( token, sn ) to next data Towards a Formal Modelfor View Maintenance in Data Warehouses – p.16/22
The Protocol: Code (2) when update ( δ i ) is received by x i : 1. x i ← x i ⊕ δ i ; 2. ∆ i ← ∆ i ⊕ δ i when incr (∆ F , sn ) is received by DWH : 1. wait ( next _ sn = sn ) ; 2. f ← f ⊕ ∆ F ; 3. next _ sn ← next _ sn + 1 Towards a Formal Modelfor View Maintenance in Data Warehouses – p.17/22
The Protocol: Sketch for the Proof Validity, Up-to-dateness and Order Consistency use a total order: the number of steps performed by the token induction on the content of the token Towards a Formal Modelfor View Maintenance in Data Warehouses – p.18/22
a Real Life Protocol How to make a quiescent protocol? when there is no update, then the token is destroyed. when an update occurs, the data source sends a request to the data warehouse if the token was destroyed, it is recreated Towards a Formal Modelfor View Maintenance in Data Warehouses – p.19/22
a Real Life Protocol (2) How to remove the ring assumption? in a star network, each message comes from/to the dwh the dwh incorporates updates and destroys/recreates the token when necessary Towards a Formal Modelfor View Maintenance in Data Warehouses – p.20/22
Extension: Multi Term Meta-datawarehouse: aggregation of multiple data warehouses a data object may appear in several views computed in the data warehouses x1 x2 x3 x4 x5 Meta−DWH DWH1 DWH2 x1x3x4+x2x3x4x5 synchronization problems, possible deadlocks. Towards a Formal Modelfor View Maintenance in Data Warehouses – p.21/22
Conclusion a formal definition of a database problem an abstract protocol provable can be adapted to fit to real-life systems efficient Towards a Formal Modelfor View Maintenance in Data Warehouses – p.22/22
Recommend
More recommend