Query Processing in a Self-Organized Storage System Hannes Mühleisen, supervised by Robert Tolksdorf 1
Distributed DBs - Goals • Scalability • Data, Queries, Nodes • Robustness • Node/Network failure • Adaptiveness • “Fair” distribution of load 2
Clustered / Federated S1 S2 S3 C1 S4 S5 S6 [Bernstein81, Epstein78] 3
Global Laws 1-2 2-3 S2 S3 0-1 3-4 S1 5-6 S4 4-5 S6 S5 [Harren02,Karnstedt04,Rösch05] 4
Probabilistic Request Routing #B S5 85% S3 10% 95% 95% S4 50% 25% S1 S6 70% 50% S2 #B? [Lindgren03] 5
[Wilensky97, NetLogo Ants model] 6
Distribution Paradigms Complex Scalability Adaptability Robustness Completeness Queries ✓ low high low high Stand-Alone ✓ high high fair high Federated ✓ high fair high high Global-Law Probabilistic high high high fair ? e.g. Swarms 7
Research Question Can complex queries be evaluated efficiently in a swarm-based distributed storage system? 8
Mutable Moving Query Plans move & repeat ↺ based on? parse ✓ rewrite ✓ optimize execute ↺ ↺ where? Catalog ✗ [Papadimos03,Battré08] 9
② ③ ② ① ③ ② ① ③ ⋈ σ ⋈ r r σ σ ⋈ r r r σ σ ↺ r r r ↺ ↺ 10
① ① r(*) ② r(#) r(#) ② r(*) ↺ p(#)= 2% p(#)=53% ↺ p(*)=78% p(*)= 3% p(#)= 2% p(*)=10% 11
① Handling Routing #Failures r(#) what now? p(#)=0% ↺ p(#)=0% Trackback! 12
Evaluation Methodology # Participating Nodes / Query 15 11,25 Optimal Plan Moving Mutable Plans better Static Plan Routing 7,5 3,75 Not actual data! 0 Query 1 Query 2 Query 3 Query4 13
Evaluation Methodology # Results / Query 600 450 Optimal Plan Moving Mutable Plans better Static Plan Routing 300 150 Not actual data! 0 Query 1 Query 2 Query 3 Query4 14
Thank You! Questions? Web Page: http://hannes.muehleisen.org
Recommend
More recommend