rhythm component distinguishable workload deployment in
play

Rhythm: Component-distinguishable Workload Deployment in Datacenters - PowerPoint PPT Presentation

Rhythm: Component-distinguishable Workload Deployment in Datacenters Laiping Zhao 1 , Yanan Yang 1 , Kaixuan Zhang 1 , Xiaobo Zhou 1 , Tie Qiu 1 , Keqiu Li 1 , Yungang Bao 2 1 Tianjin University, 2 Inst. Of Computing Technology, CAS College of


  1. Rhythm: Component-distinguishable Workload Deployment in Datacenters Laiping Zhao 1 , Yanan Yang 1 , Kaixuan Zhang 1 , Xiaobo Zhou 1 , Tie Qiu 1 , Keqiu Li 1 , Yungang Bao 2 1 Tianjin University, 2 Inst. Of Computing Technology, CAS College of Intelligence and Computing

  2. Outline p Background p Interference on LC components p Rhythm Controller p Experimental Evaluation p Conclusion 2

  3. Background p Low Resource Utilization of Datacenter p Aliyun: The average CPU utilization of co-located cluster approaches to 40% [Guo, 2019]. p Improved, but still low utilization. 3

  4. Background p Co-location: Improving the resource utilization p Interference causes unpredictable latency. p Profiling of the workload. p Real-time monitoring; p Schedule in a cross- p Passive adjustment on complementing way. resource allocation. 4

  5. Background p Many-component Services: Source: [Google , Datacenter Computers modern Source: [Deathstarbench, ASPLOS’19] challenges in CPU design , 2015] p A Single Transaction Across ~40 p SocialNetwork service. Racks of ~60 Servers Each. p 31 microservices. p Arc: client-server RPC. 5

  6. ������� ������� ���� Problem p How can we feedback-control when a request is served by multiple components collaboratively? • Latency: 𝑀 "#$%&'' = 𝑀 )*+, + 𝑀 )*+- + ⋯ • Tail Latency: 𝑈𝑀 "#$%&'' = 𝑔(𝑈𝑀 )*+, + 𝑈𝑀 )*+- + ⋯ ) p Given an overall TL, how to derive a sub-TL for each component? p OR: How the component-control affect the overall-TL? 6

  7. Inconsistent Interference Tolerance Redis architecture: E-commerce architecture: p Components perform significant difference (~435%) under the same source of interference. 7

  8. Rhythm Design p Rhythm Insight: n Components with smaller contributions to the tail latency can be co-located with BE jobs aggressively. p Challenges: n How to quantify the contributions of a component? n How to control the BE deployment aggressively? l When to colocate? l How many BEs can we co-locate with the LC? 8

  9. ������� ������� ���� �������� �������� ������� ���� ��������� ������� Rhythm n Inconsistent interference tolerance ability; p Tracking user request: ������� 1

  10. Request tracer p Causal path graph n Send/Receive events: ACCEPT, RECV, SEND, CLOSE n Event: < type , timestamp , context identifier , message identifier> n Context: < hostIP , programName , processID , threadID > n Message: < senderIP , senderPort , receiverIP , receivePort , messageSize > 10

  11. ������� �������� ������� ���� ��������� ������� ������� ���� �������� Rhythm n Inconsistent interference tolerance ability; n Tracking user request; ������� 1 �������� 1 �������� 2 �������� 3 p Servpod abstraction: n A collection of service components from one LC service that are deployed together on the same physical machine. n For deriving the sojourn time of each request in each server.

  12. ������������������������ ������� ���� ������� ������������������������ ������������������������ Rhythm n Inconsistent interference tolerance ability; n Tracking user request; n Servpod abstraction; �������� 1 �������� 2 �������� 3 n Contribution analyzing: LC LC LC

  13. Contribution Analyzer Mean Variance p Servpods with higher average sojourn time contribute more to TL. p Servpods with higher sojourn time variance contribute more to TL. p Servpods that highly correlated with the tail latency contribute more to tail latency. 13

  14. Contribution Analyzer p Is this definition effective? n Sensitivity vs contributions n The increase in the 99 th - tile latency when a single Servpod is interfered by different BEs: l Mixed BEs of wordcount, imageClassify, lstm, CPU-stress, stream-dram and stream-llc. l DRAM intensive: Stream-dram l CPU intensive: CPU-stress l LLC intensive: Stream-llc. 14

  15. �������� �������� ������������������������ ������������������������ ������������������������ �������� Rhythm n Inconsistent interference tolerance ability; n Tracking user request; n Servpod abstraction; n Contribution analyzing; LC LC LC Agent 4 Agent 2 Agent 3 … BE jobs LC LC LC BE BE BE p Controller: n Loadlimit: allowing colocation when load<loadlimit ; l The “Knee point” of performance-load curve. n Slacklimit: the lower bound of slack for allowing the growth of BEs. l Slack = SLA – currentTL; l Small contribution à larger slacklimit;

  16. Controller p When can we co-locate workloads? n Loadlimit. p Loadlimit per servpod: n The upper bound of the request load for allowing the colocation with BE jobs; n knee point : 76% of max for MySQL; 87% of max for Tomcat. 16

  17. Controller p How many BEs can we co-locate? n Slacklimit: the lower bound of slack for allowing the growth of BE jobs. Init. Slacklimt1 = 1 Init. Slacklimt2 = 1 1-contribution2 Co-locating decisions: 1-contribution1 … 1-contribution2 Slack = SLA – currentTL; 1-contribution2 1-contribution1 Slacklimit2 1-contribution1 … Slacklimit1 Servpod 1 Servpod 2 • contribution 1 < contribution 2 slacklimit1 < slacklimit 2 • 17

  18. Experimental Evaluation p Benchmarks: n LC services: n BE Tasks: l Apache Solr:Solr engine+Zookeeper l CPU-Stress; Stream-LLC; Stream-DRAM l Elasticsearch:Index+Kibana l Iperf:Network l Elgg:Webserver+Memcached+Mysql l LSTM:Mixed l Redis:Master + Slave l Wordcount l E-commerce: Haproxy+Tomcat+Amoeba+Mysql l ImageClassify: deep learning p Testbed p 16 Sockets, 64 GB of DRAM per socket. Each socket shares 20 MB of L3 cache. p Intel Xeon E7-4820 v4 @ 2.0 GHz: 32 KB L1-cache and 256 KB L2-cache per core. p The operating system is Ubuntu 14.04 with kernel version 4.4.0-31. 18

  19. Overall Analysis EMU CPU Utilization MemBan utilization p Overall analysis (compared to Heracles [ISCA,2015]) n Improve EMU (=LC throughput + BE throughput) by 11.6%~24.6%; n Improve CPU utilization by 19.1%~35.3%; n Improve memory bandwidth utilization by 16.8%~33.4%. 19

  20. Timeline Analysis p Timeline: n Time 3.3: suspendBE(); n Time 5.6: allowBEGrowth(); n Time 7.7: cutBE(); n Time 9.3: suspendBE(). 20

  21. Conclusion p Rhythm, a deployment controller that maximizes the resource utilization while guaranteeing LC service`s tail latency requirement. n Request tracer n Contribution analyzer n Controller p Experiments demonstrate the improvement on system throughput and resource utilization. 21

  22. Thank you! Questions? 22

Recommend


More recommend