interactive services with tail latency constraint
play

Interactive Services with Tail Latency Constraint Mohammad A. Islam, - PowerPoint PPT Presentation

Minimizing Electricity Cost for Geo-Distributed Interactive Services with Tail Latency Constraint Mohammad A. Islam, Anshul Gandhi, and Shaolei Ren This work was supported in part by the U.S. National Science Foundation under grants CNS-1622832,


  1. Minimizing Electricity Cost for Geo-Distributed Interactive Services with Tail Latency Constraint Mohammad A. Islam, Anshul Gandhi, and Shaolei Ren This work was supported in part by the U.S. National Science Foundation under grants CNS-1622832, CNS-1464151, CNS-1551661, CNS-1565474, and ECCS-1610471

  2. Data centers • Large IT companies have data centers all over the world • Can exploit spatial diversity using Geographical Load Balancing ( GLB ) 2

  3. Geographical load balancing (GLB) Data Center 2 Data Center 1 Data Center 3 Avg. latency t’=mean(t1,t2,t3) 3

  4. Geographical load balancing (GLB) Data Center 2 Data Center 1 Data Center 3 Assuming data required is centrally managed, and replicated over all the sites Avg. latency t’=mean(t1,t2,t3) 4

  5. GLB is facing new challenges N. America • Tons of locally generated data • Smart home, IoT, edge computing Europe Asia 5

  6. GLB is facing new challenges N. America • Tons of locally generated data • Smart home, IoT, edge computing • Limited BW for large data transfer Europe Asia 6

  7. GLB is facing new challenges N. America • Tons of locally generated data • Smart home, IoT, edge computing • Limited BW for large data transfer • Government restriction due to data Europe sovereignty and privacy concerns X Centralized processing is not practical Asia 7

  8. Geo-distributed processing is emerging 8

  9. Geo-distributed processing Region 2 request (𝒔) Region 1 Region 3 request (𝒔) User 9

  10. Geo-distributed processing Region 2 response (𝒖𝟑) request (𝒔) Region 1 Region 3 Regional Data Center request (𝒔) Processing Request User 10

  11. Geo-distributed processing Region 2 response (𝒖𝟑) request (𝒔) Region 1 Region 3 Response time depends on multiple data centers Regional Data Center request (𝒔) response 𝒖’ = 𝒏𝒃𝒚(𝒖𝟐, 𝒖𝟑, 𝒖𝟒) Processing Request User 11

  12. Tail latency based SLO • Service providers prefer tail latency (i.e., response time) based SLO • Two parameters • Percentile value (e.g., 95% or p95) • Latency threshold • Example • SLO of p95 and 100ms , means 95% of the response times should be less than 100ms • Existing research on GLB mostly focuses on average latency • Zhenhua Liu [Sigmetrics’11], Darshan S. Palasamudram [SoCC’12], Kien Li [IGCC’10, SC’11], Yanwei Zhang [Middleware’11]… 12

  13. Challenges of geo-distributed processing • How to characterize the tail latency? • Response time depends on multiple paths for each request • Includes large network latency • Simple queueing models like M/M/1 for average latency cannot be used • How to optimize load distribution among data centers? McTail: a novel GLB algorithm with data driven profiling of tail latency 13

  14. Problem formulation Total electricity cost • General formulation with 𝑂 data centers and 𝑇 traffic sources 𝑂 minimize 𝑏 ෍ 𝑟 𝑘 ⋅ 𝑓 𝑘 (𝑏 𝑘 ) Tail latency constraint 𝑘=1 𝑇𝑀𝑃 , ∀𝑗 = 1,2, ⋯ , 𝑇 subject to, 𝑞 𝑗 Ԧ 𝑏, Ԧ 𝑠 ≥ 𝑄 𝑗 𝑏 = {𝑏 1 , 𝑏 2 , ⋯ 𝑏 𝑂 } is workload (request processed) at different data centers • Ԧ 𝑗 is the network paths from source 𝑗 to all the data centers • 𝑠 • 𝑞 𝑗 is Pr(𝑒 𝑗 ≤ 𝐸 𝑗 ) , where 𝑒 𝑗 is end-to-end response time at traffic source 𝑗, and 𝐸 𝑗 is delay target (e.g., 100ms) for tail latency 14

  15. How to determine 𝒒 𝒋 (𝒃, 𝒔 𝒋 ) ? 15

  16. User Route 𝒔 𝒋,𝒌 Source 𝒋 Data Center 𝒌 𝒔𝒑𝒗𝒖𝒇 (𝒃 𝒌 , 𝒔 𝒋,𝒌 ) is the probability that response 𝒒 𝒋,𝒌 time of 𝒔 𝒋,𝒌 is less than 𝑬 𝒋 16

  17. Same request is sent to all the data centers of a group Data Center 𝟐 User Route 𝒔 𝒋,𝟑 Source 𝒋 Data Center 𝟑 Data Center 𝟒 17

  18. Destination group 𝒉 Same request is sent to all the data centers of a group Data Center 𝟐 User Route 𝒔 𝒋,𝟑 Source 𝒋 Data Center 𝟑 Data Center 𝟒 18

  19. Destination group 𝒉 Same request is sent to all the data centers of a group Data Center 𝟐 User Route 𝒔 𝒋,𝟑 Source 𝒋 Data Center 𝟑 𝒉𝒔𝒑𝒗𝒒 𝒃, 𝒔 = 𝒒 𝒋,𝟐 𝒔𝒑𝒗𝒖𝒇 𝒃 𝟐 , 𝒔 𝒋,𝟐 × 𝒒 𝒋,𝟑 𝒔𝒑𝒗𝒖𝒇 𝒃 𝟐 , 𝒔 𝒋,𝟑 × 𝒒 𝒋,𝟐 𝒔𝒑𝒗𝒖𝒇 𝒃 𝟒 , 𝒔 𝒋,𝟒 𝒒 𝒋,𝒉 Data Center 𝟒 Because of differences in data sets, random performance interference etc., response time over different routes can be considered un-correlated 19

  20. Example Data Center 𝟐 User 𝒔𝒑𝒗𝒖𝒇 = 𝟏. 𝟘𝟗 Source 𝒋 𝒒 𝒋,𝟑 Data Center 𝟑 𝒉𝒔𝒑𝒗𝒒 = 𝒒 𝒋,𝒉 𝟏. 𝟘𝟘 × 𝟏. 𝟘𝟗 × 𝟏. 𝟘𝟖 ≈ 𝟏. 𝟘𝟓 Data Center 𝟒 For requests sent to this group of data centers, 94% of the response times are less than 𝑬 𝒋 20

  21. Response time probability for a source • 𝐻 = 𝑂 1 × 𝑂 2 × ⋯ × 𝑂 𝑁 possible destination groups • Where 𝑂 𝑛 is the number of data center in region 𝑛 • Response time probability at source 𝑗 is 𝐻 𝑠 = 1 𝑕𝑠𝑝𝑣𝑞 ( Ԧ 𝑞 𝑗 𝜇 = 𝑞 𝑗 Ԧ 𝑏, Ԧ ෍ 𝜇 𝑗,𝑕 ⋅ 𝑞 𝑗,𝑕 𝑏, Ԧ 𝑠) Λ 𝑗 𝑕=1 • 𝜇 𝑗,𝑕 is the workload sent to destination group 𝑕 Weighted average over all 𝐻 𝜇 𝑗,𝑕 is the total workload from source 𝑗 • Λ 𝑗 = σ 𝑕=1 the groups 21

  22. Updated problem formulation Objective same as before, 𝑂 minimizing electricity cost minimize 𝑏 ෍ 𝑟 𝑘 ⋅ 𝑓 𝑘 (𝑏 𝑘 ) 𝑘=1 𝐻 subject to, 1 Tail latency decomposed 𝑕𝑠𝑝𝑣𝑞 ( Ԧ 𝑇𝑀𝐵 , ∀𝑗 = 1,2, ⋯ , 𝑇 ෍ 𝜇 𝑗,𝑕 ⋅ 𝑞 𝑗,𝑕 𝑏, Ԧ 𝑠) ≥ 𝑄 𝑗 into route-wise latencies Λ 𝑗 𝑕=1 𝐻 Workload constraint ෍ 𝜇 𝑗,𝑕 = Λ 𝑗 , ∀𝑗 = 1,2, ⋯ , 𝑇 𝑕=1 𝒔𝒑𝒗𝒖𝒇 (𝒃 𝒌 , 𝒔 𝒋,𝒌 ) for all routes Need to determine 𝒒 𝒋,𝒌 22

  23. Profiling response time probability of a route • We need tail latency • Hard to model for arbitrary workload distributions • Data driven approach - profile the response time statistics (find the probability distribution) from observed data • Example • Response profile for 100K request 23

  24. Challenges of data driven approach • Response time profile of a route depends on amount of data center workload • We set 𝑋 discrete levels of workload for each data center • 𝑇 × 𝑂 network paths between 𝑇 sources and 𝑂 data centers • Total 𝑻 × 𝑿 × 𝑶 number of profiles • Need to update if network latency distribution, data center configuration, or workload composition changes Slow and repeated profiling 24

  25. Profiling response statistics for one route 𝑂 is network latency distribution • 𝐺 𝑗,𝑘 𝐸 (𝑦) is data center latency distribution with load 𝑦 • 𝐺 𝑘 • End-to-end latency distribution of route 𝑠 𝑗,𝑘 is 𝐒 = 𝑮 𝒋,𝒌 𝑶 ∗ 𝑮 𝒌 𝑬 (𝒚) 𝑮 𝒋,𝒌 • where " ∗ “ is the convolution operator 𝑶 and 𝑮 𝒌 𝑬 𝒚 seperately Key idea: profile 𝑮 𝒋,𝒌 25

  26. Example Latency of data center 𝒌 with load 𝒚 Network latency of route 𝒔 𝒋,𝒌 Convolution End-to-end response 𝐒 profile of a route, 𝑮 𝒋,𝒌 26

  27. Profiling response time statistics in McTail • 𝑇 × 𝑂 network routes profiles • 𝑂 × 𝑋 data centers profiles • Total 𝑻 + 𝑿 × 𝑶 profiles versus 𝑻 × 𝑿 × 𝑶 profiles before • Profiling overhead • Only data center profiles need updating when workload composition and/or data center configuration is changed • Infrequent event • Network latency distribution may change more frequently • Already monitored by service providers • Data overhead comparable to existing GLB studies 27

  28. McTail system diagram Network Service Latency Time Distribution Distribution Network Traffic Profiler, 𝑮 𝑶 Gateway Data Center Profiler, 𝑮 𝑬 (𝒚) Data Center Utilization Network Traffic Profiler, 𝑮 𝑶 Gateway Data Center Profiler, 𝑮 𝑬 (𝒚) Data Center Utilization Network Traffic Profiler, 𝑮 𝑶 Gateway 28

  29. McTail system diagram Network Service Latency Time Distribution Distribution Network Traffic Profiler, 𝑮 𝑶 Gateway Data Center Profiler, 𝑮 𝑬 (𝒚) Data Center Utilization Network Traffic Profiler, 𝑮 𝑶 Gateway Data Center Profiler, 𝑮 𝑬 (𝒚) Data Center Utilization Network Traffic Profiler, 𝑮 𝑶 Gateway McTail Workload Prediction, 𝚳 𝐣 Electricity Price, 𝒓 𝒋 29

  30. McTail system diagram Network Service Latency Time Distribution Distribution Network Traffic Profiler, 𝑮 𝑶 Gateway Data Center Profiler, 𝑮 𝑬 (𝒚) Data Center Utilization Network Traffic Profiler, 𝑮 𝑶 Gateway Data Center Profiler, 𝑮 𝑬 (𝒚) Data Center Utilization Network Traffic Profiler, 𝑮 𝑶 Gateway McTail Workload Prediction, 𝚳 𝐣 Electricity Price, 𝒓 𝒋 Load Distribution, 𝝁 30

  31. Evaluation 31

  32. Evaluation setup Based on Google and Facebook data center locations 3 regions, 9 data centers 32

  33. Evaluation setup Based on Google and Facebook data center locations 3 regions, 9 data centers 5 traffic sources 33

Recommend


More recommend