encapsulation of parallelism in the volcano query
play

Encapsulation of Parallelism in the Volcano Query Processing System - PowerPoint PPT Presentation

Encapsulation of Parallelism in the Volcano Query Processing System Huawei Wang Overview u Architecture u Bracket Model u Operator Model u Pros & Cons u Comparison Typical Query Engine Architecture Similar Systems u System R u Starburst Bracket


  1. Encapsulation of Parallelism in the Volcano Query Processing System Huawei Wang

  2. Overview u Architecture u Bracket Model u Operator Model u Pros & Cons u Comparison

  3. Typical Query Engine Architecture

  4. Similar Systems u System R u Starburst

  5. Bracket Model Problem: Extensibility • Large overhead •

  6. Operator Model Single process: Operator mapping to iterator(Can be applied to big data system) • Use stream as abstractions for input between operators • Multiple process: Introduce exchange operator • Vertical parallelism • Horizontal parallelism •

  7. Operator Model

  8. Vertical Parallelism u Inter-process communication(fast) u Shared Memory u Semaphore open_exchange next_exchange close_exchange

  9. Horizontal Parallelism u Bushy parallelism join Producer Consumer sort A sort B u Intra-operator parallelism

  10. Horizontal Parallelism How to partition data? Port Queue1 Consumer1 Queue2 Producer Consumer2 Queue3 Consumer3

  11. Horizontal Parallelism • Centralized scheme • Propagation tree scheme • Primed process

  12. Pros & Cons u More generalized u Algorithm Level u System Level u Easy Implementation u Heavy weight creating process

  13. Comparison u Spark can choose whether to persist RDD u Volcano only let intermediate results exist in buffer u Volcano is only a query execution engine with 2 key meta operators.

Recommend


More recommend