1
play

1 Background | Problems | Challenges | Design | Evaluation | Summary - PowerPoint PPT Presentation

ApproSync Approximate State Synchronization for Programmable Networks Xiang Chen , Qun Huang, Dong Zhang, Haifeng Zhou, Chunming Wu Control Plane (CP) Applications Policies States Data Plane (DP) Programmable Switches Packets


  1. ApproSync 
 Approximate State Synchronization 
 for Programmable Networks Xiang Chen , Qun Huang, Dong Zhang, Haifeng Zhou, Chunming Wu

  2. Control Plane (CP) ··· Applications Policies States Data Plane (DP) Programmable Switches Packets Packets 1 Background | Problems | Challenges | Design | Evaluation | Summary

  3. State: Historical Packet Processing Information e.g., Count-Min Sketch running on a ToMino switch State = Set of counter values; A state value = A counter value 2 Background | Problems | Challenges | Design | Evaluation | Summary

  4. State Sync: Making States in CP and DP Consistent Control Plane (CP) Applications 1.Bottom-Up Sync. State Read Read (DP → CP) Data Plane States (in switch ASICs) Packets Packets Data Plane (DP) Programmable Switches 3 Background | Problems | Challenges | Design | Evaluation | Summary

  5. State Sync: Making States in CP and DP Consistent Control Plane (CP) Applications 1.Bottom-Up Sync. 2.Top-Down Sync. Policies State Read State Write Write (CP → DP) Read (DP → CP) Data Plane States (in switch ASICs) Packets Packets Data Plane (DP) Programmable Switches 3 Background | Problems | Challenges | Design | Evaluation | Summary

  6. Requirements 1. Low latency for latency-sensitive apps (e.g., Anomaly Detect) complete state sync within a small time 2. High accuracy for apps to make correct decisions minimize state divergence (i.e., difference) between CP and DP 4 Background | Problems | Challenges | Design | Evaluation | Summary

  7. Limitations of Existing Solutions ( Switch OS ) High Latency in Switch OS Sync state values via PCIe and TCP TCP PCIe and TCP bandwidth <100 Gbps Transfer all state updates High resource consumption >> 100 Gbps 5 Background | Problems | Challenges | Design | Evaluation | Summary

  8. Limitations of Existing Solutions ( Switch OS ) Limitations of Existing Solutions Collect 2 16 counter values via OS of a ToMino switch Our benchmark: 
 >10s latency 6 Background | Problems | Challenges | Design | Evaluation | Summary

  9. Limitations of Existing Solutions ( TrafMic Mirroring ) State Loss in TrafMic Mirroring Mirror state values to CP Low latency via bypassing switch OS State Loss due to limited link capacity 7 Background | Problems | Challenges | Design | Evaluation | Summary

  10. Limitations of Existing Solutions ( TrafMic Mirroring ) Collect 2 16 state values under 40-120 Gbps input trafMic rate 40 Gbps 120 Gbps Our benchmark: 
 80 Gbps up to 60% State Loss (Use a 40 Gbps link for state transfer) 8 Background | Problems | Challenges | Design | Evaluation | Summary

  11. Impact on Applications ( Heavy Hitter Detection ) Collect a hash table with 2 16 entries from a ToMino switch (a) Impact of High Latency (b) Impact of State Loss High Latency and State Loss seriously affects App accuracy 9 Background | Problems | Challenges | Design | Evaluation | Summary

  12. Can we achieve both Low Latency and High Accuracy ? Low Latency: OS bypassing 
 Sync states between switch ASICs and CP (w/o invoking OS) 10 Background | Problems | Challenges | Design | Evaluation | Summary

  13. Can we achieve both Low Latency and High Accuracy ? Low Latency: OS bypassing 
 Sync states between switch ASICs and CP (w/o invoking OS) High Accuracy State loss due to limited link capacity (tens of Gbps) Switch limitations (e.g., <10 MB memory) Challenge: How to handle state loss under limitations? 10 Background | Problems | Challenges | Design | Evaluation | Summary

  14. Observation Applications often tolerate a small state divergence (e.g., <1%) 
 e.g., DP value v 1 = 100; CP value v 2 = 99; div rate = |v 1 -v 2 |/v 1 × 100% = 1% 
 For heavy hitter, UDP Mlood, and superspreader detection: 11 Background | Problems | Challenges | Design | Evaluation | Summary

  15. Observation Applications often tolerate a small state divergence (e.g., <1%) 
 e.g., DP value v 1 = 100; CP value v 2 = 99; div rate = |v 1 -v 2 |/v 1 × 100% = 1% 
 For heavy hitter, UDP Mlood, and superspreader detection: State divergence < 1% → App-level error < 2% 11 Background | Problems | Challenges | Design | Evaluation | Summary

  16. ApproSync — Approximate State Sync 1. Bypass switch OS → Low Latency 2. Allow a small divergence (err) → Low Resource Consumption 
 → No State Loss → High Accuracy high latency low latency low latency full accuracy high accuracy low accuracy switch OS ApproSync trafMic mirroring 12 Background | Problems | Challenges | Design | Evaluation | Summary

  17. ApproSync — Approximate State Sync Design#1: Hash Table in Switch ASIC 1. Aggregate state updates with same locations loc val Update#1: ((1,1), 1) - Change value in (1,1) to 1 w = 4 +1 0 0 0 0 Packet A Update#2: ((1,1), 2) - Change value in (1,1) to 2 d = 3 0 2 0 0 +1 Packet B 0 0 0 0 Switch ASIC 13 Background | Problems | Challenges | Design | Evaluation | Summary

  18. ApproSync — Approximate State Sync Design#1: Hash Table in Switch ASIC 1. Aggregate state updates with same locations loc val If send all updates Update#1: ((1,1), 1) w = 4 +1 0 0 0 0 Packet A Update#2: ((1,1), 2) link saturation, state loss d = 3 0 2 0 0 +1 Packet B 0 0 0 0 Switch ASIC 13 Background | Problems | Challenges | Design | Evaluation | Summary

  19. ApproSync — Approximate State Sync Design#1: Hash Table in Switch ASIC 1. Aggregate state updates with same locations loc val If send all updates Update#1: ((1,1), 1) w = 4 +1 0 0 0 0 Packet A Update#2: ((1,1), 2) link saturation, state loss d = 3 0 2 0 0 +1 Packet B Aggregation by Hash Table 0 0 0 0 Send to CP Switch ASIC Aggregated Update: ((1,1), 2) 13 Background | Problems | Challenges | Design | Evaluation | Summary

  20. ApproSync — Approximate State Sync Design#1: Hash Table in Switch ASIC 1. Aggregate state updates with same locations 2. Bound state divergence between DP and CP DP value: v 1 CP value: v 2 State divergence: div = |v 1 -v 2 | Bound div = |v 1 -v 2 | ≤ threshold t 14 Background | Problems | Challenges | Design | Evaluation | Summary

  21. Example of Hash Table (threshold t=1) Hash Table H Loc: Counter ID Val: Latest state value in DP Loc Val Old Old: Last state value sent to CP (i.e., value in CP) 0 0 0 ··· ··· ··· Switch ASIC Controller Value[1] = 0 
 Value[1] = 0 
 Value[2] = 0 Value[2] = 0 15 Background | Problems | Challenges | Design | Evaluation | Summary

  22. Example of Hash Table (threshold t=1) Hash Table H Loc: Counter ID Val: Latest state value in DP Loc Val Old Old: Last state value sent to CP (i.e., value in CP) 1 1 0 ··· ··· ··· Update H [1].value = 1 (1, 1) Switch ASIC Controller Value[1] = 1 
 Value[1] = 0 
 Value[2] = 0 Value[2] = 0 15 Background | Problems | Challenges | Design | Evaluation | Summary

  23. Example of Hash Table (threshold t=1) Hash Table H Loc: Counter ID Val: Latest state value in DP Loc Val Old Old: Last state value sent to CP (i.e., value in CP) 1 1 0 ··· ··· ··· State divergence (div) = |Val-Old| = 1-0 = 1 ≤ t No need to sync since div is small (1, 1) Switch ASIC Controller Value[1] = 1 
 Value[1] = 0 
 Value[2] = 0 Value[2] = 0 ( div refers to state divergence ) 15 Background | Problems | Challenges | Design | Evaluation | Summary

  24. Example of Hash Table (threshold t=1) Hash Table H Loc: Counter ID Val: Latest state value in DP Loc Val Old Old: Last state value sent to CP (i.e., value in CP) 1 2 0 ··· ··· ··· H [1].value = 2: Aggregate with previous update (1, 1) Switch ASIC Controller Value[1] = 2 
 Value[1] = 0 
 (1, 2) Value[2] = 0 Value[2] = 0 15 Background | Problems | Challenges | Design | Evaluation | Summary

  25. Example of Hash Table (threshold t=1) Hash Table H Loc: Counter ID Val: Latest state value in DP Loc Val Old Old: Last state value sent to CP (i.e., value in CP) 1 2 0 ··· ··· ··· div = Val-Old = 2-0 = 2 > t Sync H[1] since div is large! (1, 1) Switch ASIC Controller (1, 2) Value[1] = 2 
 Value[1] = 2 
 (1, 2) Value[2] = 0 Value[2] = 0 ( div refers to state divergence ) 15 Background | Problems | Challenges | Design | Evaluation | Summary

  26. Example of Hash Table (threshold t=1) Hash Table H Takeaway#1: w/o Hash Table: sync all state updates 
 Loc Val Old w/o Hash Table: sync one aggregated update 
 1 2 2 ··· ··· ··· reduce link load by 50% 
 Hash Table can reduce link load (1, 1) Switch ASIC Takeaway#2: Value[1] = 2 
 (1, 2) State divergence (div) ≤ threshold t = 1 Value[2] = 0 16 Background | Problems | Challenges | Design | Evaluation | Summary

  27. ApproSync — Approximate State Sync Design#1: Hash Table in Switch ASIC 1. Aggregate state updates with same locations 2. Allow a small state divergence to reduce link load Design#2: Rate Control in Switch ASIC Adaptively tune threshold t w.r.t. incoming trafMic rate Design#3: Reliable and Atomic State Write Please refer to our paper :-) 17 Background | Problems | Challenges | Design | Evaluation | Summary

  28. Implementation ApproSync is written in P4 language and runs on ToMino switches Support State Read and State Write Protocol for State Transfer WorkMlow of Switch ASIC 18 Background | Problems | Challenges | Design | Evaluation | Summary

Recommend


More recommend