Distance Vector Setting Each node computes its forwarding table in a distributed setting: 1. Nodes know only the cost to their neighbors; not topology 2. Nodes can talk only to their neighbors using messages 3. All nodes run the same algorithm concurrently 4. Nodes and links may fail, messages may be lost CSE 461 University of Washington 44
Distance Vector Algorithm Each node maintains a vector of (distance, next hop) to all destinations 1. Initialize vector with 0 (zero) cost to self, ∞ (infinity) to other destinations 2. Periodically send vector to neighbors 3. Update vector for each destination by selecting the shortest distance heard, after adding cost of neighbor link 4. Use the best neighbor for forwarding CSE 461 University of Washington 45
Distance Vector (2) • Consider from the point of view of node A F • Can only talk to nodes B and E 2 4 To Cost E 3 A 0 Initial G 10 B ∞ 3 2 vector C ∞ 4 D ∞ D 1 E ∞ 4 A B F ∞ 2 2 G ∞ H 3 H ∞ C CSE 461 University of Washington 46
Distance Vector (3) • First exchange with B, E; learn best 1-hop routes F B E B E A’s A’s 2 4 To says says +4 +10 Cost Next E 3 A ∞ ∞ ∞ ∞ 0 -- G 10 B 0 ∞ 4 ∞ 4 B 3 2 C ∞ ∞ ∞ ∞ ∞ -- 4 D 1 D ∞ ∞ ∞ ∞ ∞ -- 4 E ∞ 0 ∞ 10 10 E A B 2 2 F ∞ ∞ ∞ ∞ ∞ -- G ∞ ∞ ∞ ∞ ∞ -- H 3 C H ∞ ∞ ∞ ∞ ∞ -- Learned better route 47
Distance Vector (4) • Second exchange; learn best 2-hop routes F B E B E A’s A’s 2 4 To says says +4 +10 Cost Next E 3 A 4 10 8 20 0 -- G 10 B 0 4 4 14 4 B 3 2 C 2 1 6 11 6 B 4 D 1 D ∞ 2 ∞ 12 12 E 4 E 4 0 8 10 8 B A B 2 2 F 3 2 7 12 7 B G 3 ∞ 7 ∞ 7 B H 3 C H ∞ ∞ ∞ ∞ ∞ -- CSE 461 University of Washington 48
Distance Vector (4) • Third exchange; learn best 3-hop routes F B E B E A’s A’s 2 4 To says says +4 +10 Cost Next E 3 A 4 8 8 18 0 -- G 10 B 0 3 4 13 4 B 3 2 C 2 1 6 11 6 B 4 D 1 D 4 2 8 12 8 B 4 E 3 0 7 10 7 B A B 2 2 F 3 2 7 12 7 B G 3 6 7 16 7 B H 3 C H 5 4 9 14 9 B CSE 461 University of Washington 49
Distance Vector (5) • Subsequent exchanges; converged F B E B E A’s A’s 2 4 To says says +4 +10 Cost Next E 3 A 4 7 8 17 0 -- G 10 B 0 3 4 13 4 B 3 2 C 2 1 6 11 6 B 4 D 1 D 4 2 8 12 8 B 4 E 3 0 7 10 8 B A B 2 2 F 3 2 7 12 7 B G 3 6 7 16 7 B H 3 C H 5 4 9 14 9 B CSE 461 University of Washington 50
Distance Vector Dynamics • Adding routes: • News travels one hop per exchange • Removing routes: • When a node fails, no more exchanges, other nodes forget Problem? CSE 461 University of Washington 51
Count to Infinity: Problem • Good news travels quickly, bad news slowly (inferred) X Desired convergence “Count to infinity” scenario CSE 461 University of Washington 52
Count to Infinity: Heuristics • “Split horizon” • Don’t send route back to where you learned it from. • Poison reverse • Send “infinity” when you notice a disconnect X X CSE 461 University of Washington 53
Count to Infinity: Heuristics (2) • Neither split horizon and poison reverse are very effective in practice • Link state is now favored except when resource-limited CSE 461 University of Washington 54
RIP (Routing Information Protocol) • DV protocol with hop count as metric • Infinity is 16 hops; limits network size • Includes split horizon, poison reverse • Routers send vectors every 30 seconds • Runs on top of UDP • Time-out in 180 secs to detect failures • RIPv1 specified in RFC1058 (1988) CSE 461 University of Washington 55
Link-State Routing
Link-State Routing • Second broad class of routing algorithms • More computation than DV but better dynamics • Widely used in practice • Used in Internet/ARPANET from 1979 • Modern networks use OSPF (L3) and IS-IS (L2) CSE 461 University of Washington 57
Link-State Setting Same distributed setting as for distance vector: 1. Nodes know only the cost to their neighbors; not topology 2. Nodes can talk only to their neighbors using messages 3. All nodes run the same algorithm concurrently 4. Nodes/links may fail, messages may be lost CSE 461 University of Washington 58
Link-State Algorithm Proceeds in two phases: 1. Nodes flood topology with link state packets Each node learns full topology • 2. Each node computes its own forwarding table By running Dijkstra (or equivalent) • CSE 461 University of Washington 59
Part 1: Flood Routing
Flooding • Rule used at each node: • Sends an incoming message on to all other neighbors • Remember the message so that it is only flood once CSE 461 University of Washington 61
Flooding (2) • Consider a flood from A; first reaches B via AB, E via F AE E G D A B H C CSE 461 University of Washington 62
Flooding (3) • Next B floods BC, BE, BF, BG, and E floods EB, EC, ED, F EF E and B send to E each other G D A B H C CSE 461 University of Washington 63
Flooding (4) • C floods CD, CH; D floods DC; F floods FG; G floods GF F F gets another copy E G D A B H C 64
Flooding (5) • H has no-one to flood … and we’re done F Each link carries the message, and in at least one direction E G D A B H C CSE 461 University of Washington 65
Flooding Details • Remember message (to stop flood) using source and sequence number • So next message (with higher sequence) will go through • To make flooding reliable, use ARQ • So receiver acknowledges, and sender resends if needed Problem? CSE 461 University of Washington 66
Flooding Problem • F receives the same message multiple times F E and B send to E each other too G D A B H C CSE 461 University of Washington 67
Part 2: Dijkstra’s Algorithm
Edsger W. Dijkstra (1930-2002) • Famous computer scientist • Programming languages • Distributed algorithms • Program verification • Dijkstra’s algorithm, 1969 • Single-source shortest paths, given network with non-negative link costs By Hamilton Richards, CC-BY-SA-3.0, via Wikimedia Commons CSE 461 University of Washington 69
Dijkstra’s Algorithm Algorithm : • Mark all nodes tentative, set distances from source to 0 (zero) for source, and ∞ (infinity) for all other nodes • While tentative nodes remain: • Extract N, a node with lowest distance • Add link to N to the shortest path tree • Relax the distances of neighbors of N by lowering any better distance estimates CSE 461 University of Washington 70
Dijkstra’s Algorithm (2) • Initialization F ∞ 2 4 E ∞ ∞ 3 G 10 3 2 ∞ 4 0 D 1 ∞ 4 A B We’ll compute 2 2 shortest paths ∞ H 3 ∞ C from A CSE 461 University of Washington 71
Dijkstra’s Algorithm (3) • Relax around A F ∞ 2 4 E ∞ 10 3 G 10 3 2 ∞ 4 0 D 1 4 4 A B 2 2 ∞ H 3 ∞ C CSE 461 University of Washington 72
Dijkstra’s Algorithm (4) Distance fell! • Relax around B F 7 2 4 E 7 8 3 G 10 3 2 ∞ 4 0 D 1 4 4 A B 2 2 H 6 3 ∞ C CSE 461 University of Washington 73
Dijkstra’s Algorithm (5) Distance fell • Relax around C F 7 again! 2 4 E 7 7 3 G 10 3 2 8 4 0 D 1 4 4 A B 2 2 H 6 3 C 9 CSE 461 University of Washington 74
Dijkstra’s Algorithm (6) Didn’t fall … • Relax around G (say) F 7 2 4 E 7 7 3 G 10 3 2 8 4 0 D 1 4 4 A B 2 2 H 6 3 C 9 CSE 461 University of Washington 75
Dijkstra’s Algorithm (7) Relax has no effect • Relax around F (say) F 7 2 4 E 7 7 3 G 10 3 2 8 4 0 D 1 4 4 A B 2 2 H 6 3 C 9 CSE 461 University of Washington 76
Dijkstra’s Algorithm (8) • Relax around E F 7 2 4 E 7 7 3 G 10 3 2 8 4 0 D 1 4 4 A B 2 2 H 6 3 C 9 CSE 461 University of Washington 77
Dijkstra’s Algorithm (9) • Relax around D F 7 2 4 E 7 7 3 G 10 3 2 8 4 0 D 1 4 4 A B 2 2 H 6 3 C 9 CSE 461 University of Washington 78
Dijkstra’s Algorithm (10) • Finally, H … done F 7 2 4 E 7 7 3 G 10 3 2 8 4 0 D 1 4 4 A B 2 2 H 6 3 C 9 CSE 461 University of Washington 79
Dijkstra Comments • Finds shortest paths in order of increasing distance from source • Leverages optimality property • Runtime depends on cost of extracting min-cost node • Superlinear in network size (grows fast) • Using Fibonacci Heaps the complexity turns out to be O(|E|+|V|log| V|) • Gives complete source/sink tree • More than needed for forwarding! • But requires complete topology CSE 461 University of Washington 80
Bringing it all together…
Phase 1: Topology Dissemination • Each node floods link state packet (LSP) that describes their portion of F the topology 2 4 E 3 Node E’s LSP G 10 Seq. # 3 2 flooded to A, B, A 10 4 B 4 C, D, and F D 1 C 1 4 A B D 2 2 2 F 2 H 3 C CSE 461 University of Washington 82
Phase 2: Route Computation • Each node has full topology • By combining all LSPs • Each node simply runs Dijkstra • Replicated computation, but finds required routes directly • Compile forwarding table from sink/source tree • That’s it folks! CSE 461 University of Washington 83
Forwarding Table Source Tree for E (from Dijkstra) E’s Forwarding Table F To Next A C 2 4 B C E 3 C C G 10 D D 3 2 E -- 4 D F F 1 G F 4 A B H C 2 2 H 3 C CSE 461 University of Washington 84
Handling Changes • On change, flood updated LSPs, re-compute routes • E.g., nodes adjacent to failed link or node initiate F F’s LSP B’s LSP Failure! 2 4 Seq. # Seq. # E 3 A 4 B 3 XXXX G 10 C 2 E 2 3 2 ∞ E 4 G 4 D 1 F 3 4 A B ∞ G 2 2 H 3 C CSE 461 University of Washington 85
Handling Changes (2) • Link failure • Both nodes notice, send updated LSPs • Link is removed from topology • Node failure • All neighbors notice a link has failed • Failed node can’t update its own LSP • But it is OK: all links to node removed CSE 461 University of Washington 86
Handling Changes (3) • Addition of a link or node • Add LSP of new node to topology • Old LSPs are updated with new link • Additions are the easy case … CSE 461 University of Washington 87
Link-State Complications • Things that can go wrong: • Seq. number reaches max, or is corrupted • Node crashes and loses seq. number • Network partitions then heals • Strategy: • Include age on LSPs and forget old information that is not refreshed • Much of the complexity is due to handling corner cases CSE 461 University of Washington 88
DV/LS Comparison Goal Distance Vector Link-State Correctness Distributed Bellman-Ford Replicated Dijkstra Efficient paths Approx. with shortest paths Approx. with shortest paths Fair paths Approx. with shortest paths Approx. with shortest paths Fast convergence Slow – many exchanges Fast – flood and compute Scalability Excellent – storage/compute Moderate – storage/compute CSE 461 University of Washington 89
IS-IS and OSPF Protocols • Widely used in large enterprise and ISP networks • IS-IS = Intermediate System to Intermediate System • OSPF = Open Shortest Path First • Link-state protocol with many added features • E.g., “Areas” for scalability CSE 461 University of Washington 90
Equal-Cost Multi-Path Routing
Multipath Routing • Allow multiple routing paths from node to destination be used at once • Topology has them for redundancy • Using them can improve performance • Questions: • How do we find multiple paths? • How do we send traffic along them? CSE 461 University of Washington 92
Equal-Cost Multipath Routes • One form of multipath routing F • Extends shortest path model by 2 4 keeping set if there are ties E 3 G 10 • Consider A à E 3 1 4 • ABE = 4 + 4 = 8 2 D • ABCE = 4 + 2 + 2 = 8 4 A B 1 2 • ABCDE = 4 + 2 + 1 + 1 = 8 H • Use them all! 3 C CSE 461 University of Washington 93
Source “Trees” • With ECMP, source/sink “tree” is a directed acyclic graph (DAG) • Each node has set of next hops • Still a compact representation Tree DAG CSE 461 University of Washington 94
Source “Trees” (2) F • Find the source “tree” for E 2 4 • Procedure is Dijkstra, simply E 3 remember set of next hops G 10 3 1 • Compile forwarding table similarly, 4 may have set of next hops 2 D 4 A B • Straightforward to extend DV too 1 2 H • Just remember set of neighbors 3 C CSE 461 University of Washington 95
Source “Trees” (3) Source Tree for E E’s Forwarding Table F 2 New for 4 Node Next hops E A B, C, D ECMP 3 G B B, C, D 10 3 1 C C, D D D 4 2 D E -- 4 A B F F 1 2 G F H C, D H 3 C CSE 461 University of Washington 96
Forwarding with ECMP • Could randomly pick a next hop for each packet based on destination • Balances load, but adds jitter • Instead, try to send packets from a given source/destination pair on the same path • Source/destination pair is called a flow • Map flow identifier to single next hop • No jitter within flow, but less balanced CSE 461 University of Washington 97
Forwarding with ECMP (2) Multipath routes from F/E to C/H E’s Forwarding Choices F Possible Example Flow 2 4 next hops choice E F à H C, D D 3 F à C C, D D G 10 3 1 E à H C, D C E à C C, D C 4 D 2 4 A B Use both paths to get 1 2 to one destination H 3 C CSE 461 University of Washington 98
Border Gateway Protocol (BGP)
Structure of the Internet • Networks (ISPs, CDNs, etc.) group with IP prefixes • Networks are richly interconnected, often using IXPs Prefix B1 Prefix D1 Prefix C1 ISP B CDN D IXP CDN C IXP Prefix E1 Prefix A1 Net E IXP IXP Net F ISP A Prefix E2 Prefix A2 Prefix F1
Internet-wide Routing Issues • Two problems beyond routing within a network 1. Scaling to very large networks • Techniques of IP prefixes, hierarchy, prefix aggregation 2. Incorporating policy decisions • Letting different parties choose their routes to suit their own needs Yikes! CSE 461 University of Washington 101
Effects of Independent Parties • Each party selects routes to ISP A ISP B suit its own interests Prefix A1 Prefix B1 • e.g, shortest path in ISP • What path will be chosen Prefix A2 for A2 à B1 and B1 à A2? Prefix B2 • What is the best path? CSE 461 University of Washington 102
Effects of Independent Parties (2) • Selected paths are longer ISP A ISP B than overall shortest path Prefix A1 Prefix B1 • And asymmetric too! • Consequence of independent goals and Prefix A2 decisions, not hierarchy Prefix B2 CSE 461 University of Washington 103
Recommend
More recommend