Seamless BGP Migration with Router Grafting Eric Keller, Jennifer - - PowerPoint PPT Presentation

seamless bgp migration with router grafting
SMART_READER_LITE
LIVE PREVIEW

Seamless BGP Migration with Router Grafting Eric Keller, Jennifer - - PowerPoint PPT Presentation

Seamless BGP Migration with Router Grafting Eric Keller, Jennifer Rexford Kobus van der Merwe Princeton University AT&T Research NSDI 2010 Dealing with Change Networks need to be highly reliable To avoid service disruptions


slide-1
SLIDE 1

Seamless BGP Migration with Router Grafting

Eric Keller, Jennifer Rexford Princeton University Kobus van der Merwe AT&T Research NSDI 2010

slide-2
SLIDE 2

Dealing with Change

2

  • Networks need to be highly reliable

– To avoid service disruptions

  • Operators need to deal with change

– Install, maintain, upgrade, or decommission equipment – Deploy new services – Manage resource usage (CPU, bandwidth)

  • But… change causes disruption

– Forcing a tradeoff

slide-3
SLIDE 3

Why is Change so Hard?

  • Root cause is the monolithic view of a router

(Hardware, software, and links as one entity)

3

slide-4
SLIDE 4

Why is Change so Hard?

  • Root cause is the monolithic view of a router

(Hardware, software, and links as one entity)

4

Revisit the design to make dealing with change easier

slide-5
SLIDE 5

Our Approach: Grafting

  • In nature: take from one, merge into another

– Plants, skin, tissue

  • Router Grafting

– To break the monolithic view – Focus on moving link (and corresponding BGP session)

5

slide-6
SLIDE 6

Why Move Links?

6

slide-7
SLIDE 7

Planned Maintenance

7

  • Shut down router to…

– Replace power supply – Upgrade to new model – Contract network

  • Add router to…

– Expand network

slide-8
SLIDE 8

Planned Maintenance

  • Could migrate links to other routers

– Away from router being shutdown, or – To router being added (or brought back up)

8

slide-9
SLIDE 9

Planned Maintenance

  • Could migrate links to other routers

– Away from router being shutdown, or – To router being added (or brought back up)

9

slide-10
SLIDE 10

Customer Requests a Feature

Network has mixture of routers from different vendors * Rehome customer to router with needed feature

10

slide-11
SLIDE 11

Traffic Management

Typical traffic engineering: * adjust routing protocol parameters based on traffic

Congested link

11

slide-12
SLIDE 12

Traffic Management

Typical traffic engineering: * adjust routing protocol parameters based on traffic

Congested link

12

slide-13
SLIDE 13

Traffic Management

Instead… * Rehome customer to change traffic matrix

13

slide-14
SLIDE 14

Traffic Management

Instead… * Rehome customer to change traffic matrix

14

slide-15
SLIDE 15

Understanding the Disruption (today)

15

delete neighbor 1.2.3.4

1) Reconfigure old router, remove old link 2) Add new link link, configure new router 3)

slide-16
SLIDE 16

Understanding the Disruption (today)

16

Add neighbor 1.2.3.4

1) Reconfigure old router, remove old link 2) Add new link link, configure new router 3) Establish new BGP session (exchange routes)

slide-17
SLIDE 17

Understanding the Disruption (today)

17

1) Reconfigure old router, remove old link 2) Add new link link, configure new router 3) Establish new BGP session (exchange routes)

Downtime (Minutes)

slide-18
SLIDE 18

Router Grafting: Breaking up the router

18

Send state Move link

slide-19
SLIDE 19

Router Grafting: Breaking up the router

19

Router Grafting enables this breaking apart a router (splitting/merging).

slide-20
SLIDE 20

Not Just State Transfer

20

Migrate session AS100 AS200 AS400 AS300

slide-21
SLIDE 21

Not Just State Transfer

21

Migrate session AS100 AS200 AS400 AS300

The topology changes

(Need to re-run decision processes)

slide-22
SLIDE 22

Goals

  • Routing and forwarding should not be disrupted

– Data packets are not dropped – Routing protocol adjacencies do not go down – All route announcements are received

  • Change should be transparent

– Neighboring routers/operators should not be involved – Redesign the routers not the protocols

22

slide-23
SLIDE 23

Challenge: Protocol Layers

BGP TCP IP BGP TCP IP

Migrate Link Migrate State Exchange routes Deliver reliable stream Send packets Physical Link A B C

23

slide-24
SLIDE 24

Physical Link

BGP TCP IP BGP TCP IP

Migrate Link Migrate State Exchange routes Deliver reliable stream Send packets Physical Link A B C

24

slide-25
SLIDE 25
  • Unplugging cable would be disruptive

Remote end-point Migrate-from Migrate-to

25

Physical Link

slide-26
SLIDE 26

mi

  • Unplugging cable would be disruptive
  • Links are not physical wires

– Switchover in nanoseconds

Remote end-point Migrate-from Migrate-to

26

Physical Link

slide-27
SLIDE 27

IP

BGP TCP IP BGP TCP IP

Migrate Link Migrate State Exchange routes Deliver reliable stream Send packets Physical Link A B C

27

slide-28
SLIDE 28
  • IP address is an identifier in BGP
  • Changing it would require neighbor to reconfigure

– Not transparent – Also has impact on TCP (later)

28

Changing IP Address

mi Remote end-point Migrate-from Migrate-to 1.1.1.1 1.1.1.2

slide-29
SLIDE 29
  • IP address not used for global reachability

– Can move with BGP session – Neighbor doesn‟t have to reconfigure

29

Re-assign IP Address

mi Remote end-point Migrate-from Migrate-to 1.1.1.1 1.1.1.2

slide-30
SLIDE 30

TCP

BGP TCP IP BGP TCP IP

Migrate Link Migrate State Exchange routes Deliver reliable stream Send packets Physical Link A B C

30

slide-31
SLIDE 31

Dealing with TCP

  • TCP sessions are long running in BGP

– Killing it implicitly signals the router is down

  • BGP and TCP extensions as a workaround

(not supported on all routers)

31

slide-32
SLIDE 32

Migrating TCP Transparently

  • Capitalize on IP address not changing

– To keep it completely transparent

  • Transfer the TCP session state

– Sequence numbers – Packet input/output queue (packets not read/ack‟d)

32

TCP(data, seq, …) send() ack TCP(data‟, seq‟) recv() app OS

slide-33
SLIDE 33

BGP

BGP TCP IP BGP TCP IP

Migrate Link Migrate State Exchange routes Deliver reliable stream Send packets Physical Link A B C

33

slide-34
SLIDE 34

BGP: What (not) to Migrate

  • Requirements

– Want data packets to be delivered – Want routing adjacencies to remain up

  • Need

– Configuration – Routing information

  • Do not need (but can have)

– State machine – Statistics – Timers

  • Keeps code modifications to a minimum

34

slide-35
SLIDE 35

Routing Information

mi

  • Could involve remote end-point

– Similar exchange as with a new BGP session – Migrate-to router sends entire state to remote end-point – Ask remote-end point to re-send all routes it advertised

  • Disruptive

– Makes remote end-point do significant work

35

Remote end-point Migrate-from Migrate-to

slide-36
SLIDE 36

Routing Information (optimization)

mi

Migrate-from router send the migrate-to router:

  • The routes it learned

– Instead of making remote end-point re-announce

  • The routes it advertised

– So able to send just an incremental update

36

Remote end-point Migrate-from Migrate-to Send routes advertised/learned

slide-37
SLIDE 37

Migration in The Background

Remote End-point Migrate-to Migrate-from

37

  • Migration takes a while

– A lot of routing state to transfer – A lot of processing is needed

  • Routing changes can happen at any time
  • Disruptive if not done in the background
slide-38
SLIDE 38

While exporting routing state

In-memory: p1, p2, p3, p4 Dump: p1, p2

Remote End-point Migrate-to Migrate-from

38

BGP is incremental, append update

slide-39
SLIDE 39

While moving TCP session and link

Remote End-point Migrate-to Migrate-from

39

TCP will retransmit

slide-40
SLIDE 40

While importing routing state

Remote End-point Migrate-to Migrate-from

40

In-memory: p1, p2 Dump: p1, p2, p3, p4

BGP is incremental, ignore dump file

slide-41
SLIDE 41

Special Case: Cluster Router

41

Switching Fabric Blade

Line card Line card Line card Line card

A B C D Blade A B C D

  • Don‟t need to re-run decision processes
  • Links „migrated‟ internally
slide-42
SLIDE 42

Special Case: Cluster Router

42

Switching Fabric Blade

Line card Line card Line card Line card

A B C D Blade A B C D

  • Don‟t need to re-run decision processes
  • Links „migrated‟ internally
slide-43
SLIDE 43

Prototype

  • Added grafting into Quagga

– Import/export routes, new „inactive‟ state – Routing data and decision process well separated

  • Graft daemon to control process
  • SockMi for TCP migration

43

Modified Quagga graft daemon

Linux kernel 2.6.19.7

SockMi.ko

Graftable Router

Handler Comm

Linux kernel 2.6.19.7-click

click.ko

Emulated link migration

Quagga

Unmod. Router

Linux kernel 2.6.19.7

slide-44
SLIDE 44

Evaluation

  • Impact on migrating routers
  • Disruption to network operation
  • Overhead on rest of the network

44

slide-45
SLIDE 45

Evaluation

  • Impact on migrating routers
  • Disruption to network operation
  • Overhead on rest of the network

45

slide-46
SLIDE 46

Impact on Migrating Routers

  • How long migration takes

– Includes export, transmit, import, lookup, decision – CPU Utilization roughly 25%

46

1 2 3 4 5 6 7 8 50000 100000 150000 200000 250000

Migration Time (seconds) RIB size (# prefixes)

Between Routers 0.9s (20k) 6.9s (200k) Between Blades 0.3s (20k) 3.1s (200k)

slide-47
SLIDE 47

Disruption to Network Operation

  • Data traffic affected by not having a link

– nanoseconds

  • Routing protocols affected by unresponsiveness

– Set old router to “inactive”, migrate link, migrate TCP, set new router to “active” – milliseconds

47

slide-48
SLIDE 48

Conclusions and Future Work

  • Enables moving a single link/session with…

– Minimal code change – No impact on data traffic – No visible impact on routing protocol adjacencies – Minimal overhead on rest of network

  • Future work

– Explore applications – Generalize grafting (multiple sessions, different protocols, other resources)

48

slide-49
SLIDE 49

Questions?

Contact info: ekeller@princeton.edu http://www.princeton.edu/~ekeller

49