validating pre commit network configuration changes at
play

Validating Pre-commit Network Configuration Changes at Scale with - PowerPoint PPT Presentation

Validating Pre-commit Network Configuration Changes at Scale with Batfish and Ansible Samir Parikh Andrius Benokraitis Ratul Mahajan Head of Product Principal Product Manager CEO, Co-Founder Intentionet Ansible Network Automation


  1. Validating Pre-commit Network Configuration Changes at Scale with Batfish and Ansible Samir Parikh Andrius Benokraitis Ratul Mahajan Head of Product Principal Product Manager CEO, Co-Founder Intentionet Ansible Network Automation Intentionet samir@intentionet.com andriusb@redhat.com ratul@intentionet.com

  2. 
 
 
 
 
 
 Oslo, Norway Tampa, FL, USA Moscow, Russia November 13, 2018 November 14, 2018 November 14, 2018 Stockholm, Sweden Johannesburg, S. Africa Antwerp, Belgium November 15, 2018 November 27, 2018 December 4, 2018 For more information or to register visit: ansible.com/automates

  3. 
 
 
 
 Houston, TX Rochester, NY Portland, OR November 7, 2018 November 7, 2018 November 6, 2018 Sunnyvale, CA Charleston, SC Seattle, WA November 15, 2018 November 27, 2018 November 7, 2018 For more information or to register visit: ansible.com/workshops

  4. WHAT WE’RE TALKING ABOUT TODAY Part I Network validation today Part II Comprehensive pre-commit validation with Batfish 
 Part III Demo of Ansible + Batfish Q/A

  5. Intentionet: Who are we? Company Mission Founded: 2015 Enable organizations to build networks Headquarters: Seattle, WA with security and reliability guarantees Funding: NSF, True Ventures � 5

  6. What are we building? 
 Comprehensive network validation solution • Open source under Apache 2.0 License • Growing user community • Multiple Fortune 500 companies • Growing developer community • Intentionet, Princeton, BBN, Microsoft, and others Batfish � 6

  7. Why are we building it? 
 Automation without validation is risky • Automation enables scale, consistency and speed 
 • But not correctness: a single typo can bring down the entire network 
 “To err is human; to propagate errors massively at scale requires automation” 
 • Effective change validation is crucial in automated workflows � 7

  8. What makes change validation effective? Performed pre-deployment & automated Production scale & covers ALL possible flows, failures and routes � 8

  9. Validation methods in use today Pre-Deployment? Comprehensive? ‘Presence’ check for configuration • Cannot validate network Text attributes behavior Analysis • NTP server • Brittle and vendor specific • DNS server • AAA setting � 9

  10. Validation methods in use today Pre-Deployment? Comprehensive? ‘Presence’ check for configuration • Cannot validate network Text attributes behavior Analysis • NTP server • Brittle and vendor specific • DNS server • AAA setting Check specific network behaviors • Not production scale • Are all BGP sessions up? • Cannot test ALL possible Emulation • Can test client reach DNS server? flows, failures, route updates • What happens if link X fails? � 10

  11. Validation methods in use today Pre-Deployment? Comprehensive? ‘Presence’ check for configuration • Cannot validate network Text attributes behavior Analysis • NTP server • Brittle and vendor specific • DNS server • AAA setting Check specific network behaviors • Not production scale • Are all BGP sessions up? • Cannot test ALL possible Emulation • Can test client reach DNS server? flows, failures, route updates • What happens if link X fails? Check operational state • Cannot test ANY possible Operational • Are all BGP sessions up? failures, routes updates State Analysis • Can client reach DNS server? • Cannot test ALL possible • Does traceroute from X to Y succeed? flows � 11

  12. Validation methods in use today Pre-Deployment? Comprehensive? ‘Presence’ check for configuration • Cannot validate network Text attributes behavior Analysis • NTP server • Brittle and vendor specific • DNS server • AAA setting Check specific network behaviors • Not production scale • No method can provide comprehensive, pre-deployment validation • Are all BGP sessions up? • Cannot test ALL possible Emulation • Can test client reach DNS server? flows, failures, route updates • What happens if link X fails? • A new approach is needed Check operational state • Cannot test ANY possible Operational • Are all BGP sessions up? failures, routes updates State Analysis • Can client reach DNS server? • Cannot test ALL possible • Does traceroute from X to Y succeed? flows � 12

  13. Introducing Model-Based validation Pre-Deployment? Comprehensive? Check ALL possible network behaviors Model-Based • Can ANY flow go from Subnet A to B? Analysis • Can ALL clients reach DNS server? • Will ANY link failure disrupt service X? � 13

  14. Introducing Model-Based validation � 14

  15. How Batfish works? Network configs Dynamic state (physical / cloud) � 15

  16. How Batfish works? And many more… Network configs Dynamic state (physical / cloud) � 16

  17. How Batfish works? Mathematical Model of 
 Routing Vendor Neutral Network Behavior Model Configuration Model Interfaces: 192.0.0.0 ≤ out.prefix Ethernet0/0: out.prefix ≤ 192.1.0.0 InterfaceCost: 1, best.valid ⇒ out.lp = 120 importPolicy: peer_in best.valid ⇒ out.ad = 20 ………. ……… Certifications: • All devices are password protected Network models Violations: • Subnets of Leaf-1 and Leaf-3 cannot Analysis engine communicate • rtr-y failure reduces availability Network configs Dynamic state Network Policy (physical / cloud) � 17

  18. Batfish Network Policies • Policies represent specific network behaviors you want to ensure hold true • Typical categories of policies would be: • Security • Reliability • Compliance � 18

  19. Batfish Network Policies Security Reliability Compliance • No traffic must pass between • No single link failure will • Device access restricted to cause an outage Subnets A & B secure communication methods only • All traffic between branch • DC Fabric must always have full Leaf to Leaf reachability offices must be encrypted • All device settings must comply with site standards • No route announcement can • DNS servers must always be disrupt internal traffic globally accessible • No undefined references are allowed on any device Policy evaluation provides correctness guarantees for 
 ALL possible packets, link failures, and route announcements � 19

  20. How can you use Batfish? • Build a CI/CD pipeline • Proactive / pre-deployment validation • Continuous / post-deployment validation • Test specific network scenarios • Test DR (Disaster Recovery) plan • Test network maintenance MOP � 20

  21. Pre-deployment change validation with Ansible and Batfish Raise FAIL Error Production Ansible Author Deploy PASS Initiate test Change Ansible Ansible Configs Generate Configs Github � 21

  22. DEMO � 22

  23. Demo LHR DC • Scenarios: border-01 border-02 1. Expand DC fabric by adding a new leaf fw-01 fw-02 2. Enable new service by updating whitelist on firewalls spine-01 spine-02 leaf-01 leaf-02 10.1.1.0/24 10.1.3.0/24 10.1.2.0/24 10.1.4.0/24 � 23

  24. Scenario 1: Expand DC fabric LHR DC • Add new leaf, lhr-leaf-03, to border-01 border-02 fabric in POD 1 • Host subnet 10.1.5.0/24 fw-01 fw-02 spine-01 spine-02 leaf-01 leaf-02 leaf-03 10.1.1.0/24 10.1.3.0/24 10.1.5.0/24 10.1.2.0/24 10.1.4.0/24 � 24

  25. Scenario 1 pipeline 1. User input: • Leaf Name DC Base Policy • POD ID • All routers must use TACACS server 1.2.3.4 • BGP ASN • All routers must use NTP servers 1.2.3.4, 1.2.3.5 • There must NOT be any undefined references 2. Generate configuration using Jinja2 • There must NOT be any unused structures templates • There must NOT be any filters with unreachable lines 3. Commit changes to git branch 4. Initiate change validation with Batfish DC Fabric Policy 5. Log validation results to S3 • All BGP sessions must be compatibly configured 6. Notify user via Slack • All BGP sessions must be established • All host subnets on Leaf routers must be able to reach all other host subnets on leaf routers • All Leaf routers must use a unique BGP ASN � 25

  26. Scenario 1 recap Batfish automatically determined that the ASN input for the first candidate change was incorrect The error would have been extremely difficult to find otherwise • All BGP sessions come up • All Spine routers have the correct routes Batfish returned no errors for the second candidate change • That is proof that the change is correct and safe � 26

  27. Scenario 2: Enable web service LHR DC • Enable web-service on hosts border-01 border-02 connected to lhr-leaf-03 • Allow ANY IP (0.0.0.0/0) to reach fw-01 fw-02 web servers (tcp:80) in subnet 10.1.5.0/27 spine-01 spine-02 leaf-01 leaf-02 leaf-03 10.1.1.0/24 10.1.3.0/24 10.1.5.0/24 10.1.2.0/24 10.1.4.0/24 � 27

Recommend


More recommend