detecting latency degradation patterns in service based
play

Detecting Latency Degradation Patterns in Service-based Systems - PowerPoint PPT Presentation

Detecting Latency Degradation Patterns in Service-based Systems Vittorio Cortellessa Luca Traini University of LAquila, Italy 11 th ACM/SPEC International Conference on Performance Engineering Challenges in Modern Distributed Systems Move


  1. Detecting Latency Degradation Patterns in Service-based Systems Vittorio Cortellessa Luca Traini University of L’Aquila, Italy 11 th ACM/SPEC International Conference on Performance Engineering

  2. Challenges in Modern Distributed Systems Move fast (Rubin and Rinard, 2016) vs Performance assurance Several performance issues come out only with real live user traffic (Veeraraghavan et al., 2016). Julia Rubin and Martin Rinard. 2016. The challenges of staying together while moving fast: an exploratory study. In Proceedings of the 38th International Conference on Software Engineering (ICSE ’16). Association for Computing Machinery, New York, NY, USA, 982–993. DOI:https://doi.org/10.1145/2884781.2884871 Kaushik Veeraraghavan, Justin Meza, David Chou, Wonho Kim, Sonia Margulis, Scott Michelson, Rajesh Nishtala, Daniel Obenshain, Dmitri Perelman, and Yee Jiun Song. 2016. Kraken: leveraging live traffic tests to identify and resolve resource utilization bottlenecks in large scale web services. In Proceedings of the 12th USENIX conference on Operating Systems Design and Implementation (OSDI’16). USENIX Association, USA, 635–650. ICPE2020

  3. Performance debugging in production Fundamental activity during software evolution Challenge : A request triggers several Remote Procedure Calls (RPC) Availability of workflow-centric solutions (e.g. Zipkin 1 , Jaeger 2 ) [1] https://zipkin.io/ [2] https://www.jaegertracing.io/ ICPE2020

  4. Triaging requests ICPE2020

  5. Triaging requests Time-consuming computation in RPC1 Slow DB query in both RPC2 and RPC3 ICPE2020

  6. Latency Degradation Patterns getProfile execution time is > 30ms getRecommended execution time is > 20ms AND getCart execution time is > 15ms ICPE2020

  7. Latency Degradation Patterns condition pattern getProfile execution time is > 30ms getRecommended execution time is > 20ms AND getCart execution time is > 15ms ICPE2020

  8. <latexit sha1_base64="9wHrJTImEyrdHv0rNDws/E72ps=">ACDXicbVC7SgNBFJ2NrxhfUSuxGQxChLDsSkQbIWhjYRHBPCBZltnJTRwy+2DmrhCW4Cf4FbZa2Ymt32Dhv7gbU2jiqQ7n3Mu953iRFBot69PILSwuLa/kVwtr6xubW8XtnaYOY8WhwUMZqrbHNEgRQAMFSmhHCpjvSWh5w8vMb92D0iIMbnEUgeOzQSD6gjNMJbe4p87L4FoVCq5doaZpZsyv0Osjt1iyTGsCOk/sKSmRKepu8avbC3nsQ4BcMq07thWhkzCFgksYF7qxhojxIRtAJ6UB80E7ySTCmB7GmFI1BUSDoR4fdGwnytR76XTvoM7/Ssl4n/eZ0Y+2dOIoIoRgh4dgiFhMkhzZVIuwHaEwoQWfY5UBFQzhRDBCUo4zwV47SsQtqHPZt+njSPTbtqntxUS7WLaTN5sk8OSJnY5JTUyBWpkwbh5IE8kWfyYjwar8ab8f4zmjOmO7vkD4yPb5qQmFI=</latexit> <latexit sha1_base64="UuO7TBdwmNUhIHejCzyuCyHrWwY=">ACGXicbVC7TsNAEDzDOEVoKQ5ESFRoMhGQdAgIWgog0QAKbGi9WUTjpzP1t0agax8AZ/AV9BCRYdoqSj4F2yTAgJTjWZ2tTsTxEpact0PZ2JyanpmtjRXnl9YXFqurKye2ygxApsiUpG5DMCikhqbJEnhZWwQwkDhRTA4zv2LGzRWRvqM7mL0Q+hr2ZMCKJM6lU1x0Fag+wr59TbHThpKPfwmcDvkbVN4nUrVrbkF+F/ijUiVjdDoVD7b3UgkIWoSCqxteW5MfgqGpFA4LcTizGIAfSxlVENIVo/LeIM+WZigSIeo+FS8ULEnxsphNbehUE2GQJd2XEvF/zWgn19v1U6jgh1CI/RDILnh+ywsisJ+RdaZAI8s+RS80FGCBCIzkIkYlJVlw568MbT/+XnO/UvHpt97RePTwaNVNi62yDbTGP7bFDdsIarMkEu2eP7Ik9Ow/Oi/PqvH2PTjijnTX2C87F4XRoDI=</latexit> <latexit sha1_base64="JrLg3+5bhlnJHdkFZQXFIefz2i0=">ACBXicbVC7SgNBFJ2NrxhfGy1tBoMQISy7EtFGCNpYRjAPSEKYndzEMbOzy8xdJYTUfoWtVnZi63dY+C9u4haeKrDOfdyz1+JIVB1/20MkvLK6tr2fXcxubW9o6d362bMNYcajyUoW76zIAUCmoUEIz0sACX0LDH15O/cY9aCNCdYOjCDoBGyjRF5xhInXtvD4vOo5TotC9K9GEHXtgu4M9BF4qWkQFJUu/ZXuxfyOACFXDJjWp4bYWfMNAouYZJrxwYixodsAK2EKhaA6Yxn0Sf0MDYMQxqBpkLSmQi/N8YsMGYU+MlkwPDWzHtT8T+vFWP/rDMWKoRFJ8eQiFhdshwLZJOgPaEBkQ2TQ5UKMqZoigBWcJ2KclJRL+vDmv18k9WPHKzsn1+VC5SJtJkv2yQEpEo+ckgq5IlVSI5w8kCfyTF6sR+vVerPef0YzVrqzR/7A+vgGuV+VvA=</latexit> <latexit sha1_base64="5KA1XYEgPkwCbJ+pJycMmd3AiI=">ACHicbVC7TgJBFJ31ifjCR2czkZhYkV2D0ZJoY4mJPBLYkLvDBSfMzm5m7pIg4Qf8Clut7Iytf2Hhv7jgFgqe6uSc+zxBrKQl1/10lpZXVtfWcxv5za3tnd3C3n7dRokRWBORikwzAItKaqyRJIXN2CEgcJGMLie+o0hGisjfUejGP0Q+lr2pABKpU7h0LSHYMhI0H2FCnvEuegUim7JnYEvEi8jRZah2il8tbuRSELUJBRY2/LcmPxOlgKhZN8O7EYgxhAH1sp1RCi9cez6yf8JLFAEY/RcKn4TMTfHWMIrR2FQVoZAt3beW8q/ue1Eupd+mOp4RQi+kikgpni6wMo0FeVcaJILp5cil5gIMEKGRHIRIxSTNKZ/m4c1/v0jqZyWvXDq/LRcrV1kyOXbEjtkp89gFq7AbVmU1JtgDe2LP7MV5dF6dN+f9p3TJyXoO2B84H9ixZm7</latexit> <latexit sha1_base64="nvfrwap5IosOc5/IQicZ3ql2r0=">ACDHicbVC7TsNAEDyHd3gFaJBoTkRIVJGNgqCgQNBQgkQSpCSy1scGjpzP5m6NQJb5BL6CFio6RMs/UPAv2CEFBKYazexqdyaIlbTkuh9OaWx8YnJqeqY8Oze/sFhZWm7aKDECGyJSkTkLwKSGhskSeFZbBDCQGEr6B8WfusGjZWRPqW7GLshXGjZkwIol/zKvpKHXWUXjN0b/a4UAt5lfqbo1dwD+l3hDUmVDHPuVz85JIQNQkF1rY9N6ZuCoakUJiVO4nFGEQfLrCdUw0h2m46SJDxjcQCRTxGw6XiAxF/bqQWnsXBvlkCHRpR71C/M9rJ9Tb7aZSxwmhFsUhkgoHh6wMq8G+bk0SATF58il5gIMEKGRHITIxSTvqpz34Y2m/0uaWzWvXts+qVf3D4bNTLM1ts42mcd2D47YseswQS7Z4/siT07D86L8+q8fY+WnOHOCvsF5/0LgsGbcg=</latexit> Formal Notation A request trace is denoted as : where e i denotes the execution time of the RPC i r = ( e 0 , e 1 , ..., e m , L ) and L the entire request latency A condition is denoted as: where j refers to the RPC j c = h j, e min , e max i A request trace r satisfies c denoted as if r = ( ..., e j , ... ) and e min ≤ e j < e max r C c ICPE2020

  9. <latexit sha1_base64="5iO6w6eEBc/2Z8xaiKcJH2cuO8=">ACDnicbVC7SgNBFJ2NrxhfUTtBhPBIiy7QdFGCNpYRjAPyIZldnITh8w+mLkrhCXgJ/gVtlrZia2/YOG/uBu30OipDufcy73neJEUGi3rwygsLC4trxRXS2vrG5tb5e2dtg5jxaHFQxmqrsc0SBFACwVK6EYKmO9J6Hjy8zv3IHSIgxucBJB32ejQAwFZ5hKbnmv2jx3Eu5aNcpdu0ZN08zY2JlW3XLFMq0Z6F9i56RCcjTd8qczCHnsQ4BcMq17thVhP2EKBZcwLTmxhojxMRtBL6UB80H3k1mGKT2MNcOQRqCokHQmws+NhPlaT3wvnfQZ3up5LxP/83oxDs/6iQiGCHg2SEUEmaHNFciLQfoQChAZNnQEVAOVMEZSgjPNUjNO2Smkf9nz6v6RdN+1j8+S6Xmlc5M0UyT45IEfEJqekQa5Ik7QIJ/fkTyRZ+PBeDFejbfv0YKR7+ySXzDevwAuYZk/</latexit> Formal Notation A pattern is denoted : where c i i is a condition and k > 0 P = { c 0 , c 1 , ..., c k } A request trace r satisfies a pattern P denoted as if ICPE2020

  10. Formal notation latency interval considered as degraded denoted as 𝐽 ICPE2020

  11. Precision and recall ICPE2020

  12. Precision and recall ICPE2020

  13. F-score ICPE2020

  14. Sub-interval analysis 𝑡 !"# 𝑡 !$% ICPE2020

  15. Splitting the interval We split the latency range with a set of potential split points: Split points are derived using Mean Shift (Comaniciu and Meer, 2002) Dorin Comaniciu and Peter Meer. 2002. Mean Shift: A Robust Approach Toward Feature Space Analysis. IEEE Trans. Pattern Anal. Mach. Intell. 24, 5 (May 2002), 603–619. DOI:https://doi.org/10.1109/34.1000236 ICPE2020

  16. Optimal split Find subset of split points such that ICPE2020

  17. Optimization Problem Main problem Sub-problem ICPE2020

  18. Main Problem: Dynamic Programming approach (Krushevskaja and Sandler, 2013) Darja Krushevskaja and Mark Sandler. 2013. Understanding latency variations of black box services. In Proceedings of the 22nd international conference on World Wide Web (WWW ’13). Association for Computing Machinery, New York, NY, USA, 703–714. DOI:https://doi.org/10.1145/2488388.2488450 ICPE2020

  19. Sub-problem: Genetic Algorithm condition 𝑘 e min e max 𝑑 ! … 𝑑 " 𝑑 # Representation pattern Fitness Mutation change a condition randomly ad add / re remo move / chang Crossover me merg rge 𝑄 ! and 𝑄 " in 𝑄 # = 𝑄 ! ⋃ 𝑄 " , then randomly sp split 𝑄 # in 𝑄 ! ′ and 𝑄 " ′ Evolution strategy 𝜈 + 𝜇 evolution strategy 1 Hans-Georg Beyer and Hans-Paul Schwefel. 2002. Evolution strategies –A comprehensive introduction. Natural Computing: an international journal 1, 1 (May 2002), 3–52. DOI:https://doi.org/10.1023/A:1015059928466 ICPE2020

  20. Fitness evaluation RPC1 RPC2 … RPCn L Pe Performa mance critical operation 300 220 … 120 490 330 250 … 125 530 … …. … … … 320 235 … 140 495 350 230 … 130 500 Checking a set of inequalities ICPE2020

  21. Optimizing fitness evaluation Intuition: Same checks are repeated several times during the evolution process Our Solution: Meaningful checks are computed and stored upfront and then reused during the evolution process ICPE2020

  22. Precomputation RPC1 RPC2 RPC3 L RPC2 execution time ≥ 235 300 220 120 490 (𝑡 $%& , 𝑡 $'( ) = (500, 600) 330 250 125 530 320 235 140 495 350 230 130 510 340 240 125 515 RPC2 KEYS VALUES positives 220 False <RPC1, 223> … 011 235 True 240 True < RPC2 , 235 > < 011 , 10 > RPC2 negatives <RPC2, 300> …. 250 True 10 … …. 230 False ICPE2020

  23. <latexit sha1_base64="5iO6w6eEBc/2Z8xaiKcJH2cuO8=">ACDnicbVC7SgNBFJ2NrxhfUTtBhPBIiy7QdFGCNpYRjAPyIZldnITh8w+mLkrhCXgJ/gVtlrZia2/YOG/uBu30OipDufcy73neJEUGi3rwygsLC4trxRXS2vrG5tb5e2dtg5jxaHFQxmqrsc0SBFACwVK6EYKmO9J6Hjy8zv3IHSIgxucBJB32ejQAwFZ5hKbnmv2jx3Eu5aNcpdu0ZN08zY2JlW3XLFMq0Z6F9i56RCcjTd8qczCHnsQ4BcMq17thVhP2EKBZcwLTmxhojxMRtBL6UB80H3k1mGKT2MNcOQRqCokHQmws+NhPlaT3wvnfQZ3up5LxP/83oxDs/6iQiGCHg2SEUEmaHNFciLQfoQChAZNnQEVAOVMEZSgjPNUjNO2Smkf9nz6v6RdN+1j8+S6Xmlc5M0UyT45IEfEJqekQa5Ik7QIJ/fkTyRZ+PBeDFejbfv0YKR7+ySXzDevwAuYZk/</latexit> Fast fitness evaluation using bitwise operators P = { c 0 , c 1 , ..., c k } KEYS VALUES &'( , 𝐶 !"# #)* > < 𝐶 !"# < 𝑘, 𝑓 !"# > &'( , 𝐶 !$% #)* > < 𝐶 !$% < 𝑘, 𝑓 !$% > … …. ICPE2020

Recommend


More recommend