scheduling mix flows in commodity datacenters with karuna
play

Scheduling Mix-flows in Commodity Datacenters with Karuna Li Chen , - PowerPoint PPT Presentation

Scheduling Mix-flows in Commodity Datacenters with Karuna Li Chen , Kai Chen, Wei Bai, Mohammad Alizadeh (MIT) SING Group, CSE Department Hong Kong University of Science and Technology Datacenter Transport Deadline flows Meeting


  1. Scheduling Mix-flows in Commodity Datacenters with Karuna Li Chen , Kai Chen, Wei Bai, Mohammad Alizadeh (MIT) SING Group, CSE Department Hong Kong University of Science and Technology

  2. Datacenter Transport • Deadline flows • Meeting deadlines • D3, D2TCP, … • General (non-deadline) flows • Reduce flow completion time (FCT). • pFabric, PDQ, PASE, PIAS, … We investigate a practical, yet neglected, problem: Coexistence of deadline and non-deadline flows Mix-flow Scheduling 2

  3. Prior solutions do not work for mix-flows Shortest Job First (SJF) Scheduling – pFabric, PASE, PIAS, PDQ Deadline Miss Rate Deadline flows 0.5 End-host End-host 0.4 Non-deadline flows Fraction 0.3 0.2 0.1 0 t 0 5 10 15 20 Flow Priority = Remaining size Percentage of non-deadline flows smaller than deadline flows. Scheduling only with sizes hurts deadline flows Problem: unawareness of deadlines. 3

  4. Prior solutions do not work for mix-flows Earliest Deadline First Scheduling – pFabric, PASE, PIAS, PDQ Deadline flows 99 Percentile FCT End-host End-host 20 Non-deadline flows 15 ms 10 Deadline Deadline 5 0 0 1 2 3 4 5 6 7 8 t Percentage of deadline flows in overall traffic Flow Priority = Time till Deadline Non-deadline: Overall Non-deadline: Size<10KB Prioritizing deadline flows hurts non-deadline flows, especially short ones. Problem: Existing transports for deadline flows unnecessarily takes all bandwidth. 4

  5. How to schedule mix-flows? Deadline Flows •Meet deadlines •Flow deadline à Priority Non-deadline Flows •Reduce FCT •Flow Size à Priority 5

  6. Karuna Key Insight: Deadline flows should minimally impact non-deadline flows. • Deadline flows • High priority with minimal bandwidth to complete just before deadlines. • Non-deadline flows • Low priority but take all available bandwidth to reduce FCT. Deadline Flows MCP Highest Priority Priority 2 Priority 3 Work Non-deadline Flows Conserv. SJF Transport Priority K 6 Network Fabric End-host

  7. Deadline flows Non-deadline flows Implementation Evaluation MCP for deadline flows: Completing deadlines with minimal bandwidth M inimal-impact C ongestion control P rotocol 7

  8. MCP: Formulation and solution • Objective à Minimal impact • Per-packet latency • Constraints: • Meet deadlines • Network capacity Per-flow Lyapunov Stochastic Convex congestion Primal Solution Optimization Optimization Optimization window update Framework [1] function 8 [1] M. J. Neely. Stochastic Network Optimization with Application to Communication and Queueing Systems , Morgan & Claypool, 2010.

  9. MCP: Formulation and solution • Objective à Minimal impact • Solution • Per-packet latency à Near-deadline completion • Constraints Rate Link Cap Rate • Meet deadlines • Network capacity t t Deadline Per-flow Lyapunov Stochastic Convex congestion Primal Solution Optimization Optimization Optimization window update Framework [1] function 9

  10. Deadline flows Non-deadline flows Implementation Evaluation Reducing FCT for non-deadline flows Mimicking SJF Non-deadline flows with/out known sizes 10

  11. Non-deadline flows with unknown size • PIAS [2] is best known scheme. Flows Send packets tagged with the highest priority until α # bytes sent. Highest Priority 2 nd Send packets tagged with 2 nd highest priority until α $ bytes sent. Highest Priority … Lowest Send packets tagged with the lowest priority. Priority 11 [2] Wei Bai, et. al., Information-Agnostic Flow Scheduling for Commodity Data Centers , USENIX NSDI 2015

  12. Karuna for non-deadline flows • Non-deadline flows with unknown size ç PIAS • Non-deadline flows with known size • Karuna extends PIAS to schedule flows with/out known sizes. Reformulation Quadratic Sum of Ratios Sum of Linear Ratios Problem Problem to include flows with known (PIAS) sizes (Karuna) Demotion Thresholds: {𝛽 ' } Demotion Thresholds: {𝛽 ' } Splitting Thresholds: {𝛾 ' } 12

  13. Karuna for non-deadline flows: mimicking SJF Flow with known sizes PIAS Priority 2 Priority 3 Size ≤ 𝛾 # Priority K Flow without known sizes 𝛾 # < Size ≤ 𝛾 $ … 𝛾 ,-# < Size End-host Network Fabric 13

  14. Deadline flows Non-deadline flows Implementation Evaluation Implementation 14

  15. Implementation Information passing Pass flow information (deadline, size) to the kernel using SO_MARK Flow size Deadline SO_MARK setsockopt() Socket End-host Network Fabric 15

  16. Implementation Information Passing Packet tagging TC module at the sender-side. Flow size Deadline Tag DSCP fields in packet headers based on thresholds. SO_MARK setsockopt() Socket pkt Tc module Tagged pkt End-host Network Fabric 16

  17. Implementation Information Passing Packet tagging Rate control Flow size Deadline TC module. Non deadline flows use DCTCP SO_MARK Modifies window size using MCP. setsockopt() Socket DCTCP pkt Modulate Tc module Congestion Window with MCP Tagged pkt End-host Network Fabric 17

  18. Implementation Information Passing Packet tagging Rate control Switch configuration Flow size Deadline ECN marking. SO_MARK Priority Queueing (priorities mapped to DSCP setsockopt() fields). Socket DCTCP pkt Strict Priority Queueing Modulate Strict Priority Queueing Tc module Congestion Window with MCP Tagged pkt Strict Priority Queueing End-host Network Fabric 18

  19. Deadline flows Non-deadline flows Implementation Evaluation Evaluation Testbed Experiments Simulations 19

  20. Evaluation: Testbed Experiments • Setup • 16 servers • A Gigabit Pronto-3295 switch • 8 Priority queues mapped to DSCP • RTT ~100us • Karuna kernel module • Traffic trace • Web search (DCTCP [3]) • Data mining (VL2 [4]) [3] Alizadeh, Mohammad, et al. "Data center tcp (dctcp)." ACM SIGCOMM computer communication review . Vol. 40. No. 4. ACM, 2010. [4] Greenberg, Albert, et al. "VL2: a scalable and flexible data center network." ACM SIGCOMM computer communication review . Vol. 39. No. 4. ACM, 2009. 20

  21. Testbed Experiments: Deadline Flows Flow Size Deadline Start Time 1 14.4MB 20ms 0ms 2 48MB 120ms 0ms 3 3MB 5ms 50ms 4 0.5MB 10ms 80ms DCTCP 1200 1000 800 Mbps 600 400 200 0 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59 61 63 65 67 69 71 73 75 77 79 81 83 85 87 89 91 93 95 97 99 101 103 105 107 109 111 113 115 117 119 121 Time (ms) Deadline Missed Flow 1 Flow 2 Flow 3 Flow 4 for Flow 1 21

  22. Testbed Experiments: Deadline Flows Flow Size Deadline Start Time 1 14.4MB 20ms 0ms 2 48MB 120ms 0ms 3 3MB 5ms 50ms 4 0.5MB 10ms 80ms pFabric – Earliest Deadline First 1200 1000 800 Mbps 600 400 200 0 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59 61 63 65 67 69 71 73 75 77 79 81 83 85 87 89 91 93 95 97 99 101 103 105 107 109 111 113 115 117 119 121 Time (ms) Flow 3 deadline Flow 1 deadline Flow 4 deadline Flow 2 deadline Flow 1 Flow 2 Flow 3 Flow 4 22

  23. Testbed Experiments: Deadline Flows Flow Size Deadline Start Time 1 14.4MB 20ms 0ms 2 48MB 120ms 0ms 3 3MB 5ms 50ms 4 0.5MB 10ms 80ms Karuna 1000 800 600 Mbps 400 200 0 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59 61 63 65 67 69 71 73 75 77 79 81 83 85 87 89 91 93 95 97 99 101 103 105 107 109 111 113 115 117 119 121 Time (ms) Flow 1 Flow 2 Flow 3 Flow 4 23

  24. Testbed Experiments: Deadline Flows Flow Size Deadline Start Time 1 14.4MB 20ms 0ms 2 48MB 120ms 0ms 3 3MB 5ms 50ms 4 0.5MB 10ms 80ms Karuna pFabric – Earliest Deadline First DCTCP 1000 1200 1200 900 1000 1000 800 700 800 800 600 500 600 600 400 400 400 300 200 200 200 100 0 0 0 1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106 113 120 1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106 113 120 1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106 113 120 Karuna completes deadline flow just before deadline, leaving bandwidth for non-deadline flows. 24

  25. Testbed Experiments: Non-deadline Flows FCT FCT FCT FCT 73.63 120.2 851.7 80 35 140 1000 67.01 4 86 2 718.3 61.13 9 104.6 900 28.725 70 30 120 3 29 800 69 608.0 60 81.72 25 100 700 4 9 50 600 20 80 ms ms ms 500 ms 40 15 60 400 30 8.04 300 10 40 5.554 20 200 2.721 5 20 1.716 10 0.916 100 0 0 0 0 0-100KB (Avg) 0-100KB (99th) 100KB-10MB (Avg) >10MB (Avg) Overall Flow Size Flow Size Flow Size Karuna DCTCP TCP Karuna DCTCP TCP Karuna DCTCP TCP Karuna DCTCP TCP Mimics shortest job first scheduling for non-deadline flows. 25

  26. Evaluation: Simulations • Simulation Setup • Spine-leaf with 144 servers • 10G Server-ToR links • 40G ToR-Spine links • Compare with: • D3 • D2TCP • pFabric - EDF 26

Recommend


More recommend