Encapsulation of Parallelism in the Volcano Query Processing System Huawei Wang
Overview u Architecture u Bracket Model u Operator Model u Pros & Cons u Comparison
Typical Query Engine Architecture
Similar Systems u System R u Starburst
Bracket Model Problem: Extensibility • Large overhead •
Operator Model Single process: Operator mapping to iterator(Can be applied to big data system) • Use stream as abstractions for input between operators • Multiple process: Introduce exchange operator • Vertical parallelism • Horizontal parallelism •
Operator Model
Vertical Parallelism u Inter-process communication(fast) u Shared Memory u Semaphore open_exchange next_exchange close_exchange
Horizontal Parallelism u Bushy parallelism join Producer Consumer sort A sort B u Intra-operator parallelism
Horizontal Parallelism How to partition data? Port Queue1 Consumer1 Queue2 Producer Consumer2 Queue3 Consumer3
Horizontal Parallelism • Centralized scheme • Propagation tree scheme • Primed process
Pros & Cons u More generalized u Algorithm Level u System Level u Easy Implementation u Heavy weight creating process
Comparison u Spark can choose whether to persist RDD u Volcano only let intermediate results exist in buffer u Volcano is only a query execution engine with 2 key meta operators.
Recommend
More recommend