Eliminating the Bandwidth Bottleneck of Central Query Dispatching Through TCP Connection Hand-Over Stefan Klauck 1 , Max Plauth 1 , Sven Knebel 1 , Marius Strobl 2 , Douglas Santry 2 , Lars Eggert 2 1 Hasso Plattner Institute, University of Potsdam, Germany 2 . March, 2019 Image: wolfro54 CC BY - NC - ND 2 . 0
Motivation In scale-out database systems, queries must be routed to individual servers. 2
Motivation In scale-out database systems, queries must be routed to individual servers. Central Dispatcher Direct Communication Client 1 DB Backend 1 Client 1 DB Backend 1 … … … … Dispatcher Client m DB Backend n Client m DB Backend n + Latency + Simple clients / dynamic backends - Requires smart clients or static backends - Central dispatcher is potential bottleneck 3
Motivation – Use Cases for Central Dispatching Shard 1 ■ Horizontal Partitioning / Sharded Database 1 2 3 4 Client 1 … Dispatcher Client m Shard 2 5 6 7 8 Replica 1 5 1 ■ Partially Replicated Database System 6 q 1 (100%) q 1 1 2 3 4 10% 2 3 4 q 5 (50%) 7 8 9 10 q 2 3 4 5 6 15% 25% □ Maximize throughput q 3 7 8 9 Replica 2 25% q 2 (100%) q 4 20% 8 9 10 5 1 q 5 (33.3%) q 5 6 1 30% by balancing the load evenly 25% Database 2 3 4 7 8 9 10 Client 1 Scale-Out Client 1 5 1 Dispatcher … … 6 Replica 3 while minimizing memory footprint Client m Client m 2 3 4 7 8 9 10 5 1 q 3 (100%) 6 25% 2 3 4 7 8 9 10 Replica 4 q 4 (100%) q 5 (16.6%) 4 5 1 Rabl et Jacobsen. Query Centric Partitioning and Allocation for Partially Replicated Database Systems. SIGMOD 2017. 25% 6 2 3 4 Klauck et Schlosser. Workload-Driven Fragment Allocation for Partially Replicated Databases Using Linear Programming. ICDE 2019. 7 8 9 10
Motivation – Central Dispatching from a Network Perspective >>> import psycopg2 >>> conn = psycopg2.connect("dbname='tpch' host='dispatcher'”) client1:65140 database1:5432 ■ Logical view Client 1 DB Backend 1 dispatcher:65228 … … Dispatcher dispatcher:5432 dispatcher:65231 Client m DB Backend n client2:65144 database2:5432 5
Motivation – Central Dispatching from a Network Perspective >>> import psycopg2 >>> conn = psycopg2.connect("dbname='tpch' host='dispatcher'”) client1:65140 database1:5432 ■ Logical view Client 1 DB Backend 1 dispatcher:65228 … … Dispatcher dispatcher:5432 dispatcher:65231 Client m DB Backend n client2:65144 database2:5432 client1:65140 database1:5432 Dispatcher ■ Physical view Client 1 DB Backend 1 dispatcher:65231 dispatcher:5432 dispatcher:65228 … … Client m Switch DB Backend n 6 client2:65144 database2:5432
Motivation ■ Whether the dispatcher becomes a bottleneck depends on the workload □ Number and size of queries/messages Client 1 DB Backend 1 … … Dispatcher □ Ratio of processed tuples and result set size Client m DB Backend n 7
Motivation ■ Whether the dispatcher becomes a bottleneck depends on the workload □ Number and size of queries/messages Client 1 DB Backend 1 … … Dispatcher □ Ratio of processed tuples and result set size Client m DB Backend n ■ “Transferring a large amount of data out of a database system to a client program is a common task.” Raasveldt et Mühleisen. Don’t Hold My Data Hostage – A Case For Client Protocol Redesign. VLDB 2017. □ Needed for statistical analyses or machine learning in clients □ Main bottleneck is network bandwidth 8
Research Goals ■ Integration of a TCP connection hand-over by means of a reprogrammable network switch into a database ■ Comparison of query-based dispatching approaches in terms of □ Throughput scaling □ Processing flexibility 9
Dispatcher Implementations ■ Traditional architecture with two separate TCP connections: client ßà dispatcher ßà database 1. HAProxy – free and open source TCP/HTTP load balancer 2. Hyrise dispatcher https://github.com/hyrise ■ Using a reprogrammable switch to perform TCP connection hand-over 3. Prism : exchange most packets directly between client and backend Y. Hayakawa et al. Prism: A Proxy Architecture for Datacenter Networks. SoCC 2017. 10
Dispatcher Implementations - Prism ■ Client query is initially sent/routed to Prism Controller ■ Prism Controller forwards connection to an appropriate backend and reprograms the switch ■ Backend processes query and sends result directly to the client (bypassing the Prism Controller) ■ Backend hands back connection to Prism Controller Prism Controller Unmatched Packets Transform Rules Connection Hand-O ff /Hand-Back Lookup(Src IP , Src TCP Rewrite Paket DB Backend Port, Dst IP , Dst Port) Information Client Client Switch Logic Prism Interface 11 Prism Switch
Experimental Evaluation ■ 10Gb and 40Gb Ethernet experiments □ Hyrise with a stored procedure https://github.com/hyrise □ wrk - HTTP benchmarking tool https://github.com/wg/wrk □ mSwitch - software switch Honda et al. mSwitch: A Highly-Scalable, Modular Software Switch. SOSR 2015. Client 1 Load-Balancer DB Backend 1 Client 1 DB Backend 1 wrk 1 Hyrise Dispatcher/ Hyrise 1 wrk 1 Prism Controller Hyrise 1 HAProxy mSwitch mSwitch Client 2 DB Backend 2 Client 2 DB Backend 2 Learning Bridge Mode Prism Switch Module wrk 2 Hyrise 2 wrk 2 Hyrise 2 Switch Switch 12
Experimental Evaluation with two Clients and Backends ■ 10 GbE results ß scales up to bandwidth: min( Σ clients, Σ backends) 20 Throughput [Gb/s] Prism 10 ß limited by bandwidth of central dispatcher Dispatcher 5 HAProxy 2.5 1.25 1 10 -2 10 -4 1B 32B 1KiB 32KiB 1MiB 32MiB Payload □ Using TCP hand-over outperforms traditional approaches for large payloads 13
Experimental Evaluation with two Clients and Backends ■ 10 GbE results ß scales up to bandwidth: min( Σ clients, Σ backends) 20 Throughput [Gb/s] Prism 10 ß limited by bandwidth of central dispatcher Dispatcher 5 HAProxy 2.5 1.25 1 ß Throughput for 512 B payload 10 -2 Prism: 50 Mb/s 10 -4 Dispatcher: 63 MB/s 1B 32B 1KiB 32KiB 1MiB 32MiB HAProxy: 42 MB/s Payload □ Using TCP hand-over outperforms traditional approaches for large payloads □ Hyrise dispatcher performs best for small payload sizes up to 4kB 14
Recommend
More recommend