PerfGuard: Binary-Centric Application Performance Monitoring in - PowerPoint PPT Presentation

PerfGuard: Binary-Centric Application Performance Monitoring in Production Environments + * Chung Hwan Kim , Junghwan Rhee , Kyu Hyung Lee , Xiangyu Zhang, Dongyan Xu + *

Performance Problems

Performance Diagnosis During Development • Complex dependencies void Foo (input) { void Main () { and layers [PLDI ‘12] while (...) { ... Latency Foo (input) • Various usage scenarios } ... } Bar (input) • Limited testing Layer #1 ... environments } int Bar (input) { Baz (input) int Baz (input) { Performance diagnosis } Latency during production? Layer #2 } Layer #3

Performance Diagnosis in Production • Software users do not have: • Profilers and tracers: • Source code perf Callgrind • Development knowledge Ftrace OProfile Gprof • But, desire to analyze gperftools LTTng performance problems [SIGMETRICS ’14] • CPU usage sampling • Many users are service • Constant overhead providers • Blind to program semantics • 3rd-party components • Sampling frequency

Performance Diagnosis in Production • PerfTrack: Microsoft products only • Application Performance Management (APM) • Limited # of pre-instrumented programs • Manual instrumentation with APIs • Required: Source code and development knowledge

Automated Perf. Diagnosis in Production? • Performance diagnosis without source code and development knowledge? • At what granularity should we measure performance? • When and where should we check performance? • How can we determine if program is too slow?

Key Ideas • Extract “hints” from program binaries through dynamic analysis • Use the hints to identify individual operations ( units ) • Generate and inject performance checks 01101011101 10110100110 11011011101 10010101110 Perf. 10011001011 Profile Units Assert (Latency <= Threshold)

PerfGuard:Binary-Centric Performance Check Creation and Injection 01101011101 Perf. 10110100110 Profile 11011011101 Unit and Instrumenting 10010101110 10011001011 Performance Guard Program with Identification Performance Guards Profile Pre-distribution Deploy Production-run Feedback Unit Performance Unit Performance Inspection Monitoring Trigger Assert (Latency <= Threshold)

Unit Identification • Unit := One iteration of event processing loop [NDSS ’13] UiThread (…) { ListenerThread (…) { WorkerThread (…) { ... ... ... 1) Most large-scale apps while (…) { while (…) { while (…) { are event-driven e = GetEvent (…) job = Accept (…) Wait (e) 2) Small number of event DistpatchEvent (e, callback) Signal (e) Process (job) } // end while } // end while } // end while processing loops ... ... ... } // end UiThread } // end ListenerThread } // end WorkerThread GetEvent Wait time time • Type I: UI programs • Type II: Server programs

Unit Classification Based on Control Flow • Units with different call trees have distinct performance t1 t2 t3 t4 t5 t6 t7 t8 t9 • Threshold estimation: Assert (Latency <= Threshold) based on time samples of unit groups • Average of 11% deviation in top 10 costly unit groups

Unit Clustering • Hierarchical clustering Unit Types • Unit distance: Similarity • Unit type : Set of clustered units Units Unit Type X Unit Type Y Unit Type W Unit Type Z

Performance Guard Generation • 3 shared library functions • Input: unit performance profile OnUnitContext (...) { x = GetUnitType ( ... ) Thread (…) { OnLoopEntry (...) { Assert ( t .Elapsed <= x .Threshold) ... u = NewUnit (…) } while (…) { } Wait (e) OnUnitStart (...) { Process (job) Perf. t = NewTimer () } // end while Profile } ... } // end Thread

How to Recognize Unit Types at Run-Time • Unit type election: Mark # of total occurrences Unit Type X Unit Type Y Unit Type W Unit Type Z (X, Y, W, Z) A A-B-D-G-E-C A-B-E-H-I-C A-B-D-E-I-C A-B-D-C-F-J (X, Y, W, Z) (X, Y, W, Z) B C • Example: Unit Type X (X, Y, W) (Z) (X, W, Z) A B D G E C D E F X: 1 2 3 4 5 6 Y: 1 2 2 2 3 4 G I J H W: 1 2 3 3 4 5 Z: 1 2 3 3 3 4 (X) (Y) (Y, W) (Z) Unit Type ( X, Y, W, ( X, Y, ( X, W, Z ) ( X ) ( X ) ( X ) Candidates Z ) W, Z ) time

Binary Code Instrumentation • Modified x86 Detours • NOP insertion using [USENIX WinNT ’99] BISTRO [ESORICS ’13] • Original program state • Arbitrary instruction preserved instrumentation * PG_for_X (PC, SP) { Foo (…) { * <Save Registers> ... * <Set PC and SP> + CALL PG_for_X * <Save Error Number> Instruction X * // Do Performance Check ... * <Restore Error Number> } * <Restore Registers> * return * }

Evaluation • Diagnosis of real-world performance bugs Program Name Bug ID Root Cause Unit Call Unit Call Unit Inserted Unit Unit Thres- Binary Trees Paths Functions Perf. Guards hold (ms) 45464 8 17,423 635 138 9,944 Apache Internal Library 15811 24 255,126 106 13 997 MySQL Client Main Binary 49491 8 270,454 980 303 2,079 MySQL Server Main Binary Average of: S1 3 30,503 140 115 122 7-Zip File Manager Main Binary 1-32 Distinct Call Trees , 124 Insertions , and 4,000-352,000 Call Paths, and S2 2 27,922 139 127 109 7-Zip File Manager Main Binary 2,337 ms 65-980 Functions S3 3 4,041 65 15 110 7-Zip File Manager Main Binary Per Buggy Unit S4 6 26,842 143 120 101 7-Zip File Manager Main Binary 2909745 16 352,831 711 370 6,797 Notepad++ Main Binary 3744 1 47,910 86 23 3,104 ProcessHacker Main Binary 5424 32 62,136 69 19 10 ProcessHacker Plug-in

Use Case: Unit Call Stack Traces • Case 1: Apache HTTP Server • Case 2: 7-Zip File Manager Unit Call Stack Unit Call Stack Assert (…) T libapr-1.dll!convert_prot T 7zFM.exe!NWindows::NCOM:: libapr-1.dll!more_finfo MyPropVariantClear 5 3 libapr-1.dll!apr_file_info_get 7zFM.exe!GetItemSize libapr-1.dll!resolve_ident R 7zFM.exe!Refresh_StatusBar R libapr-1.dll!apr_stat - 7zFM.exe!OnMessage Root Cause - mod_dav_fs.so!dav_fs_walker - 7zFM.exe!NWindows::NControl::Windo Functions - mod_dav_fs.so!dav_fs_internal_walk wProcedure - mod_dav_fs.so!dav_fs_walk - USER32.dll!InternalCallWinProc - … - … - libhttpd.dll!ap_run_process_connection - USER32.dll!DispatchMessageWorker - libhttpd.dll!ap_process_connection - USER32.dll!DispatchMessageW - libhttpd.dll!worker_main - 7zFM.exe!WinMain Performance Bug: Apache 45464 Performance Bug: 7-Zip S3

Performance Overhead • ApacheBench & SysBench: Overhead < 3% Transactions per second 8 80 Response Time (ms) 7 70 6 60 5 50 4 40 3 30 2 20 With PerfGuard With PerfGuard 1 10 Without PerfGuard Without PerfGuard 0 0 0 10 20 30 40 50 60 70 80 90 100 8 16 32 64 128 256 512 Request (%) Threads • Apache HTTP Server • MySQL Server

Related Works C. H. Kim, J. Rhee, H. Zhang, N. Arora, G. Jiang, X. Zhang, and D. Xu. IntroPerf: Transparent Context-Sensitive Multi-Layer Performance Inference Using System Stack Traces. In Proc. ACM SIGMETRICS 2014 . S. Han, Y. Dang, S. Ge, D. Zhang, and T. Xie. Performance Debugging in the Large via Mining Millions of Stack Traces. In Proc. ICSE 2012. A. Nistor, P.-C. Chang, C. Radoi, and S. Lu. CARAMEL: Detecting and Fixing Performance Problems That Have Non-intrusive Fixes. In Proc. ICSE 2015. A. Nistor, L. Song, D. Marinov, and S. Lu. Toddler: Detecting Performance Problems via Similar Memory-access Patterns. In Proc. ICSE 2013. X. Xiao, S. Han, D. Zhang, and T. Xie. Context-sensitive Delta Inference for Identifying Workload-dependent Performance Bottlenecks. In Proc. ISSTA 2013. Y. Liu, C. Xu, and S.-C. Cheung. Characterizing and Detecting Performance Bugs for Smartphone Applications. In Proc. ICSE 2014. L. Ravindranath, J. Padhye, S. Agarwal, R. Mahajan, I. Obermiller, and S. Shayandeh. AppInsight: Mobile App Performance Monitoring in the Wild. In Proc. OSDI 2012.

Conclusion • PerfGuard enables diagnosis of performance problems without source code and development knowledge • Unit-based performance profiling allows targeting a general scope of software • Automatically detects performance problems with low run-time overhead (< 3%)

Thank you! Questions? Chung Hwan Kim chungkim@cs.purdue.edu

PerfGuard: Binary-Centric Application Performance Monitoring in - PowerPoint PPT Presentation

PerfGuard: Binary-Centric Application Performance Monitoring in Production Environments + * Chung Hwan Kim , Junghwan Rhee , Kyu Hyung Lee , Xiangyu Zhang, Dongyan Xu + * Performance Problems Performance Diagnosis During Development

Binary Numbers Binary numbers look like this Binary Numbers or Binary Code Binary numbers or

A Quick Review Decimal to binary Binary to decimal Binary to hexadecimal

Binary Trees, Heaps Binary Trees, Heaps Binary trees Binary trees A binary tree (

61A Lecture 21 Announcements Binary Trees Binary Tree Class 4 Binary Tree Class class

Balanced Search Trees Binary Search Trees Binary Search Tree Binary Search Tree A binary tree is

Binary Numbers 723 Binary Numbers 723 = 7x100 + 2x10 + 3x1 Binary Numbers 723 = 7x100 + 2x10 +

Application Centric Infrastructure (ACI) The Cisco Application Centric Infrastructure (ACI) allows

CMSC 206 Binary Search Trees 1 Binary Search Tree n A Binary Search Tree is a Binary Tree in

Binary Search Trees and Balanced Binary Search Trees using AVL Trees Mark Redekopp David Kempe

LECTURE 2 Review 1 Binary Math and Assembly BINARY MATH In this section, we review Binary

Binary trees Binary trees David Morgan Binary trees Binary trees elements have up to 2

The Worlds First LED Human Centric Fluorescent Tube by Human Centric Optics Inc. 333,

GraVF: GraVF: A Vertex-Centric A Vertex-Centric Graph Processing Graph Processing Framework

Binary Search Trees A binary search tree is a binary tree T such that - each internal node

Trees Linear Vs non-linear data structures Types of binary trees Binary tree traversals

Week 8 Oliver Kullmann Binary trees The notion BinaryTrees of binary search tree Tree

Swapping and embedded: compression relieves the pressure? Vitaly Wool Embedded Linux

GARDE NI NG HAS NOT BE E N CANCE L E D. C a the rine Wissne r Unive rsity o f Wyo m

CloudABI: safe, testable and maintainable software for UNIX Speaker: Ed Schouten, ed@nuxi.nl

Supply of Quality and Safe Tropical Fruits Through Efficient Supply Chain TFNet Azizi Meor Ngah

Interpreter for crme CAraMeL Lecture 5 Formal Languages and Compilers 2011 Nataliia

Updates in Modeling the Updates in Modeling the CIV Broad Line Region CIV Broad Line Region

rs ssts t

Cado-nfs , a Number Field Sieve implementation Sep 23rd, 2011 1 / 37 P. Gaudry 1 , A. Kruppa 1 ,