Freddies: DHT-Based Adaptive Query Processing via Federated Eddies - PowerPoint PPT Presentation

Freddies: DHT-Based Adaptive Query Processing via Federated Eddies Ryan Huebsch Shawn Jeffery CS 294-4 Peer-to-Peer Systems 12/9/03

Outline � Background: PIER � Motivation: Adaptive Query Processing (Eddies) � Federated Eddies (Freddies) � System Model � Routing Policies � Implementation � Experimental Results � Conclusions and Continuing Work

PIER � Fully decentralized relational query processing engine � Principles: � Relaxed consistency � Organic Scaling � Data in its Natural Habitat � Standard Schemas via Grassroots software � Relational queries can be executed in a number of logically equivalent ways � Optimization step chooses the best performance-wise � Currently, PIER has no means to optimize queries

Adaptive Query Processing � Traditional query optimization occurs at query time and is based on statistics. This is hard because: � Catalog (statistics) must be accurate and maintained � Cannot recover from poor choices � The story gets worse! Long running queries: � � Changing selectivity/costs of operators � Assumptions made at query time may no longer hold Federated/autonomous data sources: � � No control/knowledge of statistics Heterogeneous data sources: � � Different arrival rates � Thus, Adaptive Query Processing systems attempt to change execution order during the query � Query Scrambling, Tukwila, Wisconsin, Eddies

Eddies � Eddy: A tuple router that dynamically chooses the order of operators in a query plan Optimize query at runtime on a per-tuple basis � Monitors selectivities and costs of operators to determine where � to send a tuple to next � Currently centralized in design and implementation Some other efforts for distributed Eddies from Wisconsin & � Singapore (neither use a DHT)

Why use Eddies in P2P? (The easy answers) � Much of the promise of P2P lies in its fully distributed nature � No central point of synchronization � no central catalog � Distributed catalog with statistics helps, but does not solve all problems � Possibly stale, hard to maintain � Need CAP to do the best optimization � No knowledge of available resources or the current state of the system (load, etc) � This is the PIER Philosophy! � Eddies were designed for a federated query processor � Changing operator selectivities and costs � Federated/heterogeneous data sources

Why Eddies in P2P? (The not so obvious answers) � Available compute resources in a P2P network are heterogeneous and dynamically changing � Where should the query be processed? � In a large P2P system, local data distributions, arrival rates, etc. maybe different than global

Freddies: Federated Eddies � A Freddy is an adaptive query processing operator within the PIER framework � Goals: � Show feasibility of adaptive query processing in PIER � Build foundation and infrastructure for smarter adaptive query processing � Establish baseline for Freddy performance to improve upon with smarter routing policies

An Example Freddy R join S S join T Put Local (Join Value RS) Operators Put (Join Value ST) To DHT Freddy Output Get(R) Get(T) Get(S) R S T From DHT

System Model � Same functionality as centralized Eddy � Allows easy concept reuse � Freddy uses its Routing Policy to determine the next operator for a tuple � Tuples in a Freddy are tagged with DoneBits indicating which operators have processed it � Freddy does all state management, thus existing operators require no modifications � Local processing comes first (in most cases) � Conserve network bandwidth � Not as simple as it seems � Freddy: decide how to rehash a tuple � This determines join order � Challenge: Decoupling of routing decision and operator. Most Eddy techniques no longer valid

Query Processing in Freddies � Query origin creates a query plan with a Freddy � Possible routings determined at this time, but not the order � Freddy operators on all participating nodes initiate data flow � As tuples arrive, the Freddy determines the next operator for this tuple based on the DoneBits and routing policy � Source tuples tagged with clean DoneBits and routed appropriately � When all DoneBits are set, the tuple is sent to the output operator (return to query origin)

Tuple Routing Policy � Determines to which operator to send a tuple � Local information � Messages expensive � Monitor local usage and adjust locally � “Processing Buddy” information � During processing, discover general trends in input/output nodes’ processing capabilities/output rates, etc � For instance, want to alert previous Freddy of poor PUT decisions � Design space is huge � large research area

Freddy Routing Policies � Simple (KISS): � Static � Random: Not as bad as you may think � Local Stat Monitoring (sampling) � More complex: � Queue lengths � Somewhat analogous to the “back-pressure” effect � Monitors DHT PUT ACKs � Load balancing through “learning” of global join key distribution � Piggyback stats on other messages � Don’t need global information, only stats about processing buddies (nodes with which we communicate) � Different sample than local – may or may not be better

Implementation & Experimental Setup � Design Decisions: � Simplicity is key � Roughly 300 of NCSS (PIER is about 5300) � Single query processing operator � Separate routing policy module loaded at query time � Possible routing orders determined by simple optimizer � Required generalizations to the PIER execution engine to deal with generic operators � Allow PIER to run any dataflow operator � Simulator with 256 nodes, 100 tuples/table/node � Feasibility, not scalability � In the absence of global (or stale) knowledge, a static optimizer could chose any join ordering � we compare Freddy performance to all possible static plans

3-way join � R join S join T � R join S is expensive (multiples tuple count by 25) � S join T is highly selective (drops 90%) � Possible static join orderings: T R R S S T

3 Way Join Results 1000 900 800 700 Completion Time (s) 600 RST 500 STR Eddy 400 300 200 100 0 25 50 100 150 Bandwidth/Node (KB/s)

4-way join � R join S join T join U � S join T is expensive � Possible static join orderings: U R U R T U R S R S S T S T T U Note: A traditional optimizer can’t make R S T U this plan

4-Way Join 350 300 250 Completion Time (s) RSTU 200 STRU STUR TUSR Bushy 150 Eddy 100 50 0 50 75 100 125 150 Bandwidth/Node (KB/s)

The Promise of Routing Policy � Illustrative example of how routing policy can 120 improve performance Aggregate Bandwidth (MB/s) 100 � This not meant to be an 80 exhaustive comparison of policies, rather to 60 show the possibilities 40 � EddyQL considers 20 number of outstanding PUT s (queue length) to 0 RST STR Eddy EddyQL decide where to send

Conclusions and Continuing Work � Freddies provide adaptable query processing in a P2P system � Require no global knowledge � Baseline performance shows promise for smarter policies � In the future… � Explore Freddy performance in a dynamic environment � Explore more complex routing policies

Questions? Comments? Snide remarks for Ryan? Glorious praise for Shawn? Thanks!

Freddies: DHT-Based Adaptive Query Processing via Federated Eddies - PowerPoint PPT Presentation

Freddies: DHT-Based Adaptive Query Processing via Federated Eddies Ryan Huebsch Shawn Jeffery CS 294-4 Peer-to-Peer Systems 12/9/03 Outline Background: PIER Motivation: Adaptive Query Processing (Eddies) Federated Eddies

Improve Query Performance with the Query Log Analyzer Kees Vegter Field Engineer Query Log

CS4224/CS5424 Lecture 9 Distributed Query Processing Query Processing Translates query into a

DHT Routing Presented by Emma Kilfoyle October 24, 2013 DHT History/Background 1995 -

Query Processing Relevance feedback; query expansion; Web Search 1 Overview Indexes Query

Query Execution 2 and Query Optimization Instructor: Matei Zaharia cs245.stanford.edu Query

Neural Nets for Adaptive Filter and Adaptive Neural Nets as Adaptive Filters Pattern Recognition

Adaptive Control Chapter 1: Introduction to Adaptive Control Adaptive Control Landau, Lozano,

Adaptive Control Chapter 11: Direct Adaptive Control 1 Adaptive Control Landau, Lozano,

Chapter 3: Top-k Query Processing and Indexing 3.1 Top-k Algorithms 3.2 Approximate Top-k Query

Adaptive Control Chapter 12: Indirect Adaptive Control 1 Adaptive Control Landau, Lozano,

A Generic Mapping-based Query Translation A Generic Mapping-based Query Translation from SPARQL

Query Understanding: A Manifesto Daniel Tunkelang queryunderstanding.com Overview What is

Perfect Query FORMULA 5 critical sections in every successful query letter (c) 2019

Query Op)miza)on 1 Query op)miza)on Given an SQL query,

Bulletinboard DHT and wireguard-p2p https://github.com/manuels FOSDEM 2018 February 2 nd

KadOH Kademlia over HTTP a Javascript framework bringing DHT to mobile applications What have

Network Layer Support for Gigabit TCP Flows in Wireless Mesh Networks This work is collaborated

WINLAB Rutgers University Routing in MobilityFirst: Objectives Efficient and robust support of

QoS QoS Aware Aware BiNoC BiNoC Architecture Architecture Shih Shih- -Hsin Hsin Lo, Ying

RADWAN | Rate Adaptive Wide Area Networks Rachee Singh / U. Massachusetts Amherst Manya Ghobadi /

Dynamic Programming Algorithms for Planning and Robotics in Continuous Domains and the

Greedy routing by distributed D l Delaunay triangulation t i l ti 4/4/2017 Greedy Routing (S.

Becoming More Tolerant: Designing FPGAs for Variable Supply Voltage Ibrahim Ahmed Linda Shen

for 3D Network-on-Chip Akram Ben Ahmed, Abderazek Ben Abdallah The University of Aizu School of

Sambuz

Useful Links

Newsletter

Mail Us