Detecting Latency Degradation Patterns in Service-based Systems Vittorio Cortellessa Luca Traini University of L’Aquila, Italy 11 th ACM/SPEC International Conference on Performance Engineering
Challenges in Modern Distributed Systems Move fast (Rubin and Rinard, 2016) vs Performance assurance Several performance issues come out only with real live user traffic (Veeraraghavan et al., 2016). Julia Rubin and Martin Rinard. 2016. The challenges of staying together while moving fast: an exploratory study. In Proceedings of the 38th International Conference on Software Engineering (ICSE ’16). Association for Computing Machinery, New York, NY, USA, 982–993. DOI:https://doi.org/10.1145/2884781.2884871 Kaushik Veeraraghavan, Justin Meza, David Chou, Wonho Kim, Sonia Margulis, Scott Michelson, Rajesh Nishtala, Daniel Obenshain, Dmitri Perelman, and Yee Jiun Song. 2016. Kraken: leveraging live traffic tests to identify and resolve resource utilization bottlenecks in large scale web services. In Proceedings of the 12th USENIX conference on Operating Systems Design and Implementation (OSDI’16). USENIX Association, USA, 635–650. ICPE2020
Performance debugging in production Fundamental activity during software evolution Challenge : A request triggers several Remote Procedure Calls (RPC) Availability of workflow-centric solutions (e.g. Zipkin 1 , Jaeger 2 ) [1] https://zipkin.io/ [2] https://www.jaegertracing.io/ ICPE2020
Triaging requests ICPE2020
Triaging requests Time-consuming computation in RPC1 Slow DB query in both RPC2 and RPC3 ICPE2020
Latency Degradation Patterns getProfile execution time is > 30ms getRecommended execution time is > 20ms AND getCart execution time is > 15ms ICPE2020
Latency Degradation Patterns condition pattern getProfile execution time is > 30ms getRecommended execution time is > 20ms AND getCart execution time is > 15ms ICPE2020
<latexit sha1_base64="9wHrJTImEyrdHv0rNDws/E72ps=">ACDXicbVC7SgNBFJ2NrxhfUSuxGQxChLDsSkQbIWhjYRHBPCBZltnJTRwy+2DmrhCW4Cf4FbZa2Ymt32Dhv7gbU2jiqQ7n3Mu953iRFBot69PILSwuLa/kVwtr6xubW8XtnaYOY8WhwUMZqrbHNEgRQAMFSmhHCpjvSWh5w8vMb92D0iIMbnEUgeOzQSD6gjNMJbe4p87L4FoVCq5doaZpZsyv0Osjt1iyTGsCOk/sKSmRKepu8avbC3nsQ4BcMq07thWhkzCFgksYF7qxhojxIRtAJ6UB80E7ySTCmB7GmFI1BUSDoR4fdGwnytR76XTvoM7/Ssl4n/eZ0Y+2dOIoIoRgh4dgiFhMkhzZVIuwHaEwoQWfY5UBFQzhRDBCUo4zwV47SsQtqHPZt+njSPTbtqntxUS7WLaTN5sk8OSJnY5JTUyBWpkwbh5IE8kWfyYjwar8ab8f4zmjOmO7vkD4yPb5qQmFI=</latexit> <latexit sha1_base64="UuO7TBdwmNUhIHejCzyuCyHrWwY=">ACGXicbVC7TsNAEDzDOEVoKQ5ESFRoMhGQdAgIWgog0QAKbGi9WUTjpzP1t0agax8AZ/AV9BCRYdoqSj4F2yTAgJTjWZ2tTsTxEpact0PZ2JyanpmtjRXnl9YXFqurKye2ygxApsiUpG5DMCikhqbJEnhZWwQwkDhRTA4zv2LGzRWRvqM7mL0Q+hr2ZMCKJM6lU1x0Fag+wr59TbHThpKPfwmcDvkbVN4nUrVrbkF+F/ijUiVjdDoVD7b3UgkIWoSCqxteW5MfgqGpFA4LcTizGIAfSxlVENIVo/LeIM+WZigSIeo+FS8ULEnxsphNbehUE2GQJd2XEvF/zWgn19v1U6jgh1CI/RDILnh+ywsisJ+RdaZAI8s+RS80FGCBCIzkIkYlJVlw568MbT/+XnO/UvHpt97RePTwaNVNi62yDbTGP7bFDdsIarMkEu2eP7Ik9Ow/Oi/PqvH2PTjijnTX2C87F4XRoDI=</latexit> <latexit sha1_base64="JrLg3+5bhlnJHdkFZQXFIefz2i0=">ACBXicbVC7SgNBFJ2NrxhfGy1tBoMQISy7EtFGCNpYRjAPSEKYndzEMbOzy8xdJYTUfoWtVnZi63dY+C9u4haeKrDOfdyz1+JIVB1/20MkvLK6tr2fXcxubW9o6d362bMNYcajyUoW76zIAUCmoUEIz0sACX0LDH15O/cY9aCNCdYOjCDoBGyjRF5xhInXtvD4vOo5TotC9K9GEHXtgu4M9BF4qWkQFJUu/ZXuxfyOACFXDJjWp4bYWfMNAouYZJrxwYixodsAK2EKhaA6Yxn0Sf0MDYMQxqBpkLSmQi/N8YsMGYU+MlkwPDWzHtT8T+vFWP/rDMWKoRFJ8eQiFhdshwLZJOgPaEBkQ2TQ5UKMqZoigBWcJ2KclJRL+vDmv18k9WPHKzsn1+VC5SJtJkv2yQEpEo+ckgq5IlVSI5w8kCfyTF6sR+vVerPef0YzVrqzR/7A+vgGuV+VvA=</latexit> <latexit sha1_base64="5KA1XYEgPkwCbJ+pJycMmd3AiI=">ACHicbVC7TgJBFJ31ifjCR2czkZhYkV2D0ZJoY4mJPBLYkLvDBSfMzm5m7pIg4Qf8Clut7Iytf2Hhv7jgFgqe6uSc+zxBrKQl1/10lpZXVtfWcxv5za3tnd3C3n7dRokRWBORikwzAItKaqyRJIXN2CEgcJGMLie+o0hGisjfUejGP0Q+lr2pABKpU7h0LSHYMhI0H2FCnvEuegUim7JnYEvEi8jRZah2il8tbuRSELUJBRY2/LcmPxOlgKhZN8O7EYgxhAH1sp1RCi9cez6yf8JLFAEY/RcKn4TMTfHWMIrR2FQVoZAt3beW8q/ue1Eupd+mOp4RQi+kikgpni6wMo0FeVcaJILp5cil5gIMEKGRHIRIxSTNKZ/m4c1/v0jqZyWvXDq/LRcrV1kyOXbEjtkp89gFq7AbVmU1JtgDe2LP7MV5dF6dN+f9p3TJyXoO2B84H9ixZm7</latexit> <latexit sha1_base64="nvfrwap5IosOc5/IQicZ3ql2r0=">ACDHicbVC7TsNAEDyHd3gFaJBoTkRIVJGNgqCgQNBQgkQSpCSy1scGjpzP5m6NQJb5BL6CFio6RMs/UPAv2CEFBKYazexqdyaIlbTkuh9OaWx8YnJqeqY8Oze/sFhZWm7aKDECGyJSkTkLwKSGhskSeFZbBDCQGEr6B8WfusGjZWRPqW7GLshXGjZkwIol/zKvpKHXWUXjN0b/a4UAt5lfqbo1dwD+l3hDUmVDHPuVz85JIQNQkF1rY9N6ZuCoakUJiVO4nFGEQfLrCdUw0h2m46SJDxjcQCRTxGw6XiAxF/bqQWnsXBvlkCHRpR71C/M9rJ9Tb7aZSxwmhFsUhkgoHh6wMq8G+bk0SATF58il5gIMEKGRHITIxSTvqpz34Y2m/0uaWzWvXts+qVf3D4bNTLM1ts42mcd2D47YseswQS7Z4/siT07D86L8+q8fY+WnOHOCvsF5/0LgsGbcg=</latexit> Formal Notation A request trace is denoted as : where e i denotes the execution time of the RPC i r = ( e 0 , e 1 , ..., e m , L ) and L the entire request latency A condition is denoted as: where j refers to the RPC j c = h j, e min , e max i A request trace r satisfies c denoted as if r = ( ..., e j , ... ) and e min ≤ e j < e max r C c ICPE2020
<latexit sha1_base64="5iO6w6eEBc/2Z8xaiKcJH2cuO8=">ACDnicbVC7SgNBFJ2NrxhfUTtBhPBIiy7QdFGCNpYRjAPyIZldnITh8w+mLkrhCXgJ/gVtlrZia2/YOG/uBu30OipDufcy73neJEUGi3rwygsLC4trxRXS2vrG5tb5e2dtg5jxaHFQxmqrsc0SBFACwVK6EYKmO9J6Hjy8zv3IHSIgxucBJB32ejQAwFZ5hKbnmv2jx3Eu5aNcpdu0ZN08zY2JlW3XLFMq0Z6F9i56RCcjTd8qczCHnsQ4BcMq17thVhP2EKBZcwLTmxhojxMRtBL6UB80H3k1mGKT2MNcOQRqCokHQmws+NhPlaT3wvnfQZ3up5LxP/83oxDs/6iQiGCHg2SEUEmaHNFciLQfoQChAZNnQEVAOVMEZSgjPNUjNO2Smkf9nz6v6RdN+1j8+S6Xmlc5M0UyT45IEfEJqekQa5Ik7QIJ/fkTyRZ+PBeDFejbfv0YKR7+ySXzDevwAuYZk/</latexit> Formal Notation A pattern is denoted : where c i i is a condition and k > 0 P = { c 0 , c 1 , ..., c k } A request trace r satisfies a pattern P denoted as if ICPE2020
Formal notation latency interval considered as degraded denoted as 𝐽 ICPE2020
Precision and recall ICPE2020
Precision and recall ICPE2020
F-score ICPE2020
Sub-interval analysis 𝑡 !"# 𝑡 !$% ICPE2020
Splitting the interval We split the latency range with a set of potential split points: Split points are derived using Mean Shift (Comaniciu and Meer, 2002) Dorin Comaniciu and Peter Meer. 2002. Mean Shift: A Robust Approach Toward Feature Space Analysis. IEEE Trans. Pattern Anal. Mach. Intell. 24, 5 (May 2002), 603–619. DOI:https://doi.org/10.1109/34.1000236 ICPE2020
Optimal split Find subset of split points such that ICPE2020
Optimization Problem Main problem Sub-problem ICPE2020
Main Problem: Dynamic Programming approach (Krushevskaja and Sandler, 2013) Darja Krushevskaja and Mark Sandler. 2013. Understanding latency variations of black box services. In Proceedings of the 22nd international conference on World Wide Web (WWW ’13). Association for Computing Machinery, New York, NY, USA, 703–714. DOI:https://doi.org/10.1145/2488388.2488450 ICPE2020
Sub-problem: Genetic Algorithm condition 𝑘 e min e max 𝑑 ! … 𝑑 " 𝑑 # Representation pattern Fitness Mutation change a condition randomly ad add / re remo move / chang Crossover me merg rge 𝑄 ! and 𝑄 " in 𝑄 # = 𝑄 ! ⋃ 𝑄 " , then randomly sp split 𝑄 # in 𝑄 ! ′ and 𝑄 " ′ Evolution strategy 𝜈 + 𝜇 evolution strategy 1 Hans-Georg Beyer and Hans-Paul Schwefel. 2002. Evolution strategies –A comprehensive introduction. Natural Computing: an international journal 1, 1 (May 2002), 3–52. DOI:https://doi.org/10.1023/A:1015059928466 ICPE2020
Fitness evaluation RPC1 RPC2 … RPCn L Pe Performa mance critical operation 300 220 … 120 490 330 250 … 125 530 … …. … … … 320 235 … 140 495 350 230 … 130 500 Checking a set of inequalities ICPE2020
Optimizing fitness evaluation Intuition: Same checks are repeated several times during the evolution process Our Solution: Meaningful checks are computed and stored upfront and then reused during the evolution process ICPE2020
Precomputation RPC1 RPC2 RPC3 L RPC2 execution time ≥ 235 300 220 120 490 (𝑡 $%& , 𝑡 $'( ) = (500, 600) 330 250 125 530 320 235 140 495 350 230 130 510 340 240 125 515 RPC2 KEYS VALUES positives 220 False <RPC1, 223> … 011 235 True 240 True < RPC2 , 235 > < 011 , 10 > RPC2 negatives <RPC2, 300> …. 250 True 10 … …. 230 False ICPE2020
<latexit sha1_base64="5iO6w6eEBc/2Z8xaiKcJH2cuO8=">ACDnicbVC7SgNBFJ2NrxhfUTtBhPBIiy7QdFGCNpYRjAPyIZldnITh8w+mLkrhCXgJ/gVtlrZia2/YOG/uBu30OipDufcy73neJEUGi3rwygsLC4trxRXS2vrG5tb5e2dtg5jxaHFQxmqrsc0SBFACwVK6EYKmO9J6Hjy8zv3IHSIgxucBJB32ejQAwFZ5hKbnmv2jx3Eu5aNcpdu0ZN08zY2JlW3XLFMq0Z6F9i56RCcjTd8qczCHnsQ4BcMq17thVhP2EKBZcwLTmxhojxMRtBL6UB80H3k1mGKT2MNcOQRqCokHQmws+NhPlaT3wvnfQZ3up5LxP/83oxDs/6iQiGCHg2SEUEmaHNFciLQfoQChAZNnQEVAOVMEZSgjPNUjNO2Smkf9nz6v6RdN+1j8+S6Xmlc5M0UyT45IEfEJqekQa5Ik7QIJ/fkTyRZ+PBeDFejbfv0YKR7+ySXzDevwAuYZk/</latexit> Fast fitness evaluation using bitwise operators P = { c 0 , c 1 , ..., c k } KEYS VALUES &'( , 𝐶 !"# #)* > < 𝐶 !"# < 𝑘, 𝑓 !"# > &'( , 𝐶 !$% #)* > < 𝐶 !$% < 𝑘, 𝑓 !$% > … …. ICPE2020
Recommend
More recommend