optimizing cost and performance in online service
play

Optimizing Cost and Performance in Online Service Provider Networks - PowerPoint PPT Presentation

Optimizing Cost and Performance in Online Service Provider Networks Ming Zhang Microsoft Research Joint work with Zheng Zhang (Purdue), Albert Greenberg (MSR), Y. Charlie Hu (Purdue), Ratul Mahajan (MSR), and Blaine Christian (Microsoft) 1


  1. Optimizing Cost and Performance in Online Service Provider Networks Ming Zhang Microsoft Research Joint work with Zheng Zhang (Purdue), Albert Greenberg (MSR), Y. Charlie Hu (Purdue), Ratul Mahajan (MSR), and Blaine Christian (Microsoft) 1

  2. OSP network User (IP prefix) ISP 6 OSP ISP 1 ISP 5 DC 3 ISP 2 DC 1 DC 2 ISP 4 ISP 3 2

  3. Key factors in OSP traffic engineering • Cost – Google Search: 5B queries/month – MSN Messenger: 330M users/month – Traffic volume exceeding a PB/day • Performance – Directly impacts user experience and revenue • Purchases, search queries, ad click-through rates 3

  4. Current TE solution is limited • Current practice is mostly manual – Incoming: DNS redirection, nearby DC – Outgoing: BGP, manually configured • Complex TE strategy space – (~300K prefixes) x (~10 DC) x (~10 routes/prefix) – Link capacity creates dependencies among prefixes 4 4

  5. Prior work on TE • I ntra-domain TE for transit ISPs – Balancing load across internal paths – Not considering end-to-end performance • Route selection for multi-homed stub networks – Single site – Small number of ISPs 5 5

  6. Our contributions • Formulation of OSP TE problem • Design & implementation of Entact – A route-injection-based measurement technique – An online TE optimization framework • Extensive evaluations in MSN – 40% cost reduction – Low operational overheads 6 6

  7. Problem formulation Users DCs Links Users Link 1 d 1 d 1 DC 1 d 2 d 2 Link 2 DC 2 d 3 d 3 Link 3 • INPUT: user prefixes, DCs, & external links • OUTPUT: TE strategy, user prefix  (DC, external link) • CONSTRAINTS: link capacity, route availability 7

  8. Cost & performance measures • Use RTT as the performance measure – Many latency-sensitive apps: search, email, maps – Apps are chatty: N x RTT quickly gets to 100+ms • Transit cost: F(v) = price x v – Ignore internal traffic cost 8

  9. Measuring alternative paths with route injection • Minimal impact on 5.6.7.0/24 current traffic AS 1 • Existing approaches are inapplicable AS 3 AS 2 IP3 IP2 OSP Routing table prefix next-hop AS Path * 5.6.7.0/24 IP2 AS2 AS1 IP3 AS3 AS1 5.6.7.8/32 next-hop=IP3 * 5.6.7.8/32 IP3 Route injection daemon 9

  10. Selecting desirable strategy N Cost • M strategies for N prefixes and M alternative paths/prefix Optimal strategy curve • Only consider optimal strategies •Finding “sweet spot” based on desirable cost-performance tradeoff • K extra cost for unit latency decrease Weighted RTT Sweet spot, slope = - K 10

  11. Computing optimal strategy • P95 cost optimization is complex – Optimize short-term cost online – Evaluate using P95 cost • Reduced an ILP problem – Find a fractional solution – Convert to an integer solution 11

  12. Finding optimal strategy curve Cost Optimal strategy curve Weighted RTT 12

  13. Entact architecture Netflow data Routing tables ENTACT n live IPs per prefix n-1 injected routes per prefix Per-prefix traffic RTT of n alternative volume routes per prefix Optimal TE strategy Capacity & price of external links, slope K 13

  14. Experimental setup • MSN: one of the largest OSP networks – 11 DCs, 1,000+ external links • Assumptions in evaluation – Traffic and performance do not change with TE strategies • 6K destination prefixes from 2,791 ASes – High-volume, single-location, representative 14

  15. Benefits of Entact 350 Default Entact (K=10) 300 LowestCost (K=0) BestPerf (K = ∞) 250 P95 cost 200 • 40% cost reduction 150 • Significant cost/perf tradeoff 100 50 0 25 35 45 55 65 wRTT (msec) 15

  16. Where does cost reduction come from? prefixes wRTT difference short-term cost path chosen by Entact (%) (msec) difference same 88.2 0 0 cheaper & shorter 1.7 -8 -309 cheaper & longer 5.5 +12 -560 pricier & shorter 4.6 -15 +42 pricier & longer 0.1 0 0 • Entact makes “intelligent” performance -cost tradeoff • Automation is crucial for handling complexity & dynamics 16

  17. Conclusions • TE automation is crucial for large OSP network – Multiple DCs – Many external links – Dependencies between prefixes • Entact -- first online TE scheme for OSP network – 40% cost reduction w/o performance degradation – Low operational overhead 17

Recommend


More recommend