fault tolerant service function chaining
play

Fault Tolerant Service Function Chaining M. GHAZNAVI, E. JALALPOUR, - PowerPoint PPT Presentation

Fault Tolerant Service Function Chaining M. GHAZNAVI, E. JALALPOUR, B. WONG, R. BOUTABA, A. MASHTIZADEH UN UNIV IVERSITY ITY OF F WATE TERLOO Middleboxes and Service Function Chains Firewall IDS NAT Internet 1 Middlebox Failures 100


  1. Fault Tolerant Service Function Chaining M. GHAZNAVI, E. JALALPOUR, B. WONG, R. BOUTABA, A. MASHTIZADEH UN UNIV IVERSITY ITY OF F WATE TERLOO

  2. Middleboxes and Service Function Chains Firewall IDS NAT Internet 1

  3. Middlebox Failures 100 Percent Contribution 81 High Severity Incidents 75 Population 50 43 36 25 20 11 3 2 1 0 L2 Switches L3 Routers Middleboxes Others Demystifying the dark side of the middle: A field study of middlebox failures in datacenters IMC 2013 2

  4. Middlebox Fault Tolerance Alice ✕ NAT Internet Bob NAT 3

  5. Consistent State Replication Alice Bob ⬄ Bing Alice ⬄ Apple NAT Internet Bob NAT Bob ⬄ Bing Alice ⬄ Apple 4

  6. Consistent State Replication Alice Bob ⬄ Bing Alice ⬄ Apple NAT Internet Bob NAT Bob ⬄ Bing 5

  7. Previous Approaches EXTERNALIZED STATE SNAPSHOT BASED StatelessNF , NSDI 2017 Pico Replication , SoCC 2013 CHC , NSDI 2019 FTMB , SIGCOMM 2015 REINFORCE , CoNEXT 2018 6

  8. Externalized State Approach NAT Internet Read/Write State Fault Tolerant Data Store 7

  9. Snapshot Based Approaches Alice Bob ⬄ Bing Alice ⬄ Apple NAT Internet Bob NAT Primary state Bob ⬄ Bing Replicated state Alice ⬄ Apple 8

  10. Snapshot Based Approaches for a Chain High Latency FW IDS NAT Internet Low Throughput NAT FW IDPS Primary state Replicated state 9

  11. Our Approach Fault Tolerant F I I ’ N F ’ … Firewall IDS NAT Primary state Replicated state 10

  12. Goals Consistent state replication to tolerate ! middlebox failures Minimizing performance overhead during normal operation Minimizing disruption during middlebox failures 11

  13. Fault Tolerant Chaining (FTC) In-chain replication ◦ Replicates a chain’s state instead of the state of individual middleboxes ◦ Each middlebox’s state replicated to subsequent ! middlebox servers Transactional packet processing ◦ Simplifies the development of multi-threaded middleboxes ◦ Improves scalability and performance Data dependency vectors ◦ Enables concurrent state replication 12

  14. Normal Operation m 1 m 2 m 3 1 2 2 3 r 3 Buffer Forw. r 1 r 2 13

  15. Normal Operation m 1 m 2 m 3 1 1 2 2 3 r 3 Buffer Forw. r 1 r 2 14

  16. Normal Operation m 1 m 2 m 3 3 1 1 2 2 3 r 3 Buffer Forw. r 1 r 2 15

  17. Failure Recovery ✕ m 1 m 2 m 3 1 3 1 1 2 2 2 3 Forw. Buffer r 1 r 2 r 3 m 2 Primary state 2 r ’2 Replicated state 16

  18. Transactional Packet Processing Existing approaches ◦ Single thread or batched packet processing ◦ FTMB: multi threaded packet processing ◦ Tracking state changes in granularity of each state variable read/write ◦ Frequent periodic state snapshots Our approach ◦ Packet transaction model for concurrent packet processing ◦ Using isolation property to tracking state changes in granularity of packet transactions 17

  19. Data Dependency Vectors Tracking data changes instead of thread operations Enabling different number of threads at the middlebox and replicas ◦ Fail over to smaller machine ◦ Scale up to a larger machine Middlebox Product Throughput CPU Core HP VSR1001 268 Mbps 1 IPSec HP VSR1008 926 Mbps 8 STEELHEAD CCX770M 10 Mbps 2 WAN Optimizer STEELHEAD CCX1555M 50 Mbps 4 Barracuda Level 1 100 Mbps 1 WAF Barracuda Level 5 200 Mbps 2 18

  20. Data Dependency Vectors Example W(1) Middlebox 1 Middlebox’s dependency vector: ⟨ 0,x,x ⟩ ⟨ 1,3,4 ⟩ ⟨ 0,3,4 ⟩ 1 2 ⟨ 2,3,5 ⟩ 2 ⟨ 0,x,x ⟩ ⟨ 1,x,4 ⟩ R(1), W(3) ✓ ⟨ 0,x,x ⟩ ≥ ⟨ 1,x,4 ⟩ ⟨ 0,3,4 ⟩ Replica’s dependency vector: 4 Replica ⟨ 1,3,4 ⟩ ⟨ 1,3,4 ⟩ ⟨ 0,3,4 ⟩ ⟨ 0,3,4 ⟩ ⟨ 2,3,5 ⟩ ⟨ 0,3,4 ⟩ 4 5 hold 3 5 ⟨ 1,3,4 ⟩ ≥ ≥ ⟨ 0,3,4 ⟩ ⟨ 1,x,4 ⟩ ⟨ 1,x,4 ⟩ ⟨ 1,x,4 ⟩ ✓ ? 19

  21. Evaluation METHOD ENVIRONMENTS Comparing FTC with: A cluster of 12 servers ◦ 40 Gbps network NF , N on- F ault tolerant system ◦ Ideal performance SAVI Cloud environment ◦ A virtual network of OVS switches FTMB (SIGCOMM 2015) ◦ State logging + Snapshots MoonGen and pktGen traffic generators ◦ UDP traffic FTMB + Snapshot (SIGCOMM 2015) ◦ Packet size: 256 B ◦ State logging + Snapshots 20

  22. Fault Tolerant NATs 2x higher NF FTC FTMB throughput 10 Throughput (Mpps) 8 6 4 2 0 1 2 4 8 Threads 21

  23. Fault Tolerant Chains – Throughput NF FTC FTMB FTMB+Snapshot 10 3.5x higher 1.8x higher Throughput (Mpps) throughput throughput 8 6 39% drop due 4 to snapshots 2 0 2 3 4 5 Chain Length 22

  24. Fault Tolerant Chains – Latency NF FTC FTMB 1 . 0 0 . 8 Packets (CDF) 0 . 6 0 . 4 0 . 2 0 . 0 40 60 80 100 120 140 160 180 Latency (µs) 23

  25. Conclusion Keep operation of a chain of middleboxes online after ! middleboxes fail ◦ In-chain replication ◦ Transactional packet processing ◦ Data dependency vectors 24

Recommend


More recommend