Formal approaches to: Coordination-free query evaluation and multi-query optimization in parallel and distributed systems Bas Ketsman
Outline CALM Formalization CALM Revision 1 Coordination-free evaluation Conclusion Parallel-Correctness Transferability Multi-Query optimization Conclusion 2 / 46
for a setting where nodes have no information about the horizontal-distribution of records for settings where nodes have information about the horizontal-distribution of record [Ameloot, Neven, Van den Bussche, 2011]: TRUE [Zinn, Green, Ludäscher, 2012]: FALSE Introduction Context: Declarative Networking, where Datalog based languages are used for parallel and distributed computing in clusters with disordered communication. CALM-conjecture: No-coordination ? = Monotonicity [Hellerstein, 2010] 3 / 46
for settings where nodes have information about the horizontal-distribution of record for a setting where nodes have no information about the horizontal-distribution of records [Zinn, Green, Ludäscher, 2012]: FALSE Introduction Context: Declarative Networking, where Datalog based languages are used for parallel and distributed computing in clusters with disordered communication. CALM-conjecture: No-coordination ? = Monotonicity [Hellerstein, 2010] [Ameloot, Neven, Van den Bussche, 2011]: TRUE 3 / 46
for a setting where nodes have no information about the horizontal-distribution of records for settings where nodes have information about the horizontal-distribution of record Introduction Context: Declarative Networking, where Datalog based languages are used for parallel and distributed computing in clusters with disordered communication. CALM-conjecture: No-coordination ? = Monotonicity [Hellerstein, 2010] [Ameloot, Neven, Van den Bussche, 2011]: TRUE [Zinn, Green, Ludäscher, 2012]: FALSE 3 / 46
for settings where nodes have information about the horizontal-distribution of record Introduction Context: Declarative Networking, where Datalog based languages are used for parallel and distributed computing in clusters with disordered communication. CALM-conjecture: No-coordination ? = Monotonicity [Hellerstein, 2010] [Ameloot, Neven, Van den Bussche, 2011]: TRUE ▶ for a setting where nodes have no information about the horizontal-distribution of records [Zinn, Green, Ludäscher, 2012]: FALSE 3 / 46
Introduction Context: Declarative Networking, where Datalog based languages are used for parallel and distributed computing in clusters with disordered communication. CALM-conjecture: No-coordination ? = Monotonicity [Hellerstein, 2010] [Ameloot, Neven, Van den Bussche, 2011]: TRUE ▶ for a setting where nodes have no information about the horizontal-distribution of records [Zinn, Green, Ludäscher, 2012]: FALSE ▶ for settings where nodes have information about the horizontal-distribution of record 3 / 46
Goal: To clarify the relation between monotonicity and coordination in asynchronous systems and to reveal the more complete picture
Outline CALM Formalization CALM Revision 1 Coordination-free evaluation Conclusion Parallel-Correctness Transferability Multi-Query optimization Conclusion 5 / 46
Example : Select triangles in a graph : Select open triangles in a graph Monotonicity Definition A query Q is monotone if Q ( I ) ⊆ Q ( I ∪ J ) for all database instances I and J . Notation: M = class of monotone queries 6 / 46
Monotonicity Definition A query Q is monotone if Q ( I ) ⊆ Q ( I ∪ J ) for all database instances I and J . Notation: M = class of monotone queries Example ▶ Q ∆ : Select triangles in a graph ∈ M ▶ Q < : Select open triangles in a graph ̸∈ M 6 / 46
Monotonicity Definition A query Q is monotone if Q ( I ) ⊆ Q ( I ∪ J ) for all database instances I and J . Notation: M = class of monotone queries Example ▶ Q ∆ : Select triangles in a graph ∈ M ▶ Q < : Select open triangles in a graph ̸∈ M 6 / 46
Monotonicity Definition A query Q is monotone if Q ( I ) ⊆ Q ( I ∪ J ) for all database instances I and J . Notation: M = class of monotone queries Example ▶ Q ∆ : Select triangles in a graph ∈ M ▶ Q < : Select open triangles in a graph ̸∈ M 6 / 46
Semantics defined in terms of runs over a transition system Relational Transducer Networks [Ameloot, Neven, Van den Bussche, 2011] ▶ Network N = { x, y, u, z } ▶ Transducer Π ▶ messages can be arbitrarily delayed but never get lost 7 / 46
Relational Transducer Networks [Ameloot, Neven, Van den Bussche, 2011] ▶ Network N = { x, y, u, z } ▶ Transducer Π ▶ messages can be arbitrarily delayed but never get lost Semantics defined in terms of runs over a transition system 7 / 46
Eventual Consistent Query Evaluation Definition A transducer Π computes a query Q if ▶ for all networks N , Network independent ▶ for all databases I , Distribution independent ▶ for all horizontal distributions H , and ▶ for every run of Π , out (Π) = Q ( I ) . Consistency requirement 8 / 46
Algorithm: Broadcast all data output triangles whenever new data arrives Extremely naive, but works .. and is coordination-free! Example: Q ∆ : select all triangles 9 / 46
Algorithm: Broadcast all data output triangles whenever new data arrives Extremely naive, but works .. and is coordination-free! Example: Q ∆ : select all triangles 9 / 46
Algorithm: Broadcast all data output triangles whenever new data arrives Extremely naive, but works .. and is coordination-free! Example: Q ∆ : select all triangles 9 / 46
Extremely naive, but works .. and is coordination-free! Example: Q ∆ : select all triangles Algorithm: ▶ Broadcast all data ▶ output triangles whenever new data arrives 9 / 46
Example: Q ∆ : select all triangles Algorithm: ▶ Broadcast all data ▶ output triangles whenever new data arrives Extremely naive, but works .. and is coordination-free! 9 / 46
Coordination is needed to reason about the absence of records. Example: Q < : select all open triangles 10 / 46
Coordination is needed to reason about the absence of records. Example: Q < : select all open triangles ? 10 / 46
Coordination is needed to reason about the absence of records. Example: Q < : select all open triangles no no ? ? ? 10 / 46
Example: Q < : select all open triangles no no ? ? Coordination is needed to reason about the absence of records. 10 / 46
Definition is coordination-free if for all inputs I there is a distribution on which computes I without having to do communication. [Ameloot, Neven, Van den Bussche, 2011] Coordination-freeness Goal: separate data-communication from coordination-communication 11 / 46
Coordination-freeness Goal: separate data-communication from coordination-communication Definition Π is coordination-free if for all inputs I there is a distribution on which Π computes Q ( I ) without having to do communication. [Ameloot, Neven, Van den Bussche, 2011] 11 / 46
Algorithm: Output triangles whenever new data arrives Example: Ideal Distribution Q ∆ : select all triangles 12 / 46
Example: Ideal Distribution Q ∆ : select all triangles Algorithm: ▶ Broadcast all data ▶ Output triangles whenever new data arrives 12 / 46
Example: Ideal Distribution Q ∆ : select all triangles Algorithm: ▶ (Broadcast all data) ▶ Output triangles whenever new data arrives 12 / 46
CALM-conjecture [Ameloot, Neven, Van den Bussche, 2011] A query has a coordination-free and eventually consistent execution strategy iff the query is monotone Theorem F 0 = M Definition F 0 = set of queries which are distributedly computed by coordination-free transducers 13 / 46
Outline CALM Formalization CALM Revision 1 Coordination-free evaluation Conclusion Parallel-Correctness Transferability Multi-Query optimization Conclusion 14 / 46
Policy-aware Transducers “Distribution policy” 15 / 46
Policy-aware Transducers . . . . . . “Distribution policy” . . . 15 / 46
Policy-aware Transducers Deduction rules ▶ in local database ⇒ in global database ▶ not in local database + in scope ⇒ not in global database ▶ not in local database + not in scope ⇒ unknown 16 / 46
Policy-aware Transducers . . . . . . “Distribution policy” . . . ? 17 / 46
Policy-aware Transducers . . . . . . “Distribution policy” . . . 17 / 46
Policy-aware Transducers [Zinn, Green, Ludäscher, 2012] Definition A distribution policy P for σ and N is a total function from facts ( σ ) to the power set of N . Definition A policy-aware transducer is a transducer with access to P restricted to its active domain Definition F 1 = set of queries which are distributedly computed by policy-aware coordination-free transducers 18 / 46
Domain-distinct-monotonicity Definition A fact f is domain distinct from instance I when adom ( f ) ̸⊆ adom ( I ) . Example � f f ′ I 19 / 46
Domain-distinct-monotonicity Definition A query Q is domain-distinct-monotone if Q ( I ) ⊆ Q ( I ∪ J ) for all I and J , with J having only domain-distinct facts Notation: M distinct = domain-distinct-monotone queries M M distinct Remark M distinct : class of queries preserved under extensions 20 / 46
Recommend
More recommend