Application Heartbeats Henry Hoffmann, Jonathan Eastep, Marco Santambrogio, Jason Miller, Anant Agarwal CSAIL Massachusetts Institute of Technology Cambridge, MA 02139 http://groups.csail.mit.edu/carbon/heartbeats
Outline • Introduction/Motivation – Problem: Monitoring applications in self-tuning systems – Solution: Standard interface expresses performance/goals • Application Heartbeats • Experiments • Conclusion 2
As System Complexity Increases, Self-Tuning Systems Emerge • System Complexity is Skyrocketing – Multicore processors – Parallel communication libraries – Heterogeneous architectures – Distributed, deep memory hierarchies – Special-purpose functional units – Unreliable components – New constraints: power, energy, wire delay • Application programmers must be experts in systems and apps Possible Solution: Self-Tuning Systems Systems observe their runtime behavior, learn, and take actions to meet desired goals 3
Self-tuning Systems Must Monitor the Applications They Support Currently, applications run as performance black-boxes: Application App 2 App 3 App 1 Layer speed voltage, freq, cache size, power IPC, power, miss precision associativity temp rate App 2 Self-Tuning Scheduler, Memory Services Cache Cache App 1 manager, file I/O system Layer Disk App 3 Operating Devices Core Core DRAM System We propose Application Heartbeats as a standard API for applications to specify their goals and performance to self-tuning system services 4
Outline • Introduction/Motivation • Application Heartbeats – Idea – Interface • Experiments • Conclusion 5
The Application Heartbeats Idea • At key intervals, apps issue a heartbeat using a simple function call • Apps also register desired performance with other function calls • The performance (heart rate) can be read within the application (a) or by another process (b) • If performance is low the system adapts to increase performance 6
Application Heartbeats Provide Standard API for Expressing Performance & Goals Apps no App 3 App 1 App 2 Heartbeat longer heartbeat, performance goals black-boxes Application Heartbeats App 1 App 2 App 3 Min heart rate = 10 Min heart rate = 29.5 Min heart rate = 0.5 Max heart rate = 100 Max heart rate = 30 Max heart rate = 1.5 Current heart rate = 75 Current heart rate = 29.8 Current heart rate = .2 speed voltage, freq, cache size, power activity, miss precision associativity power, temp rate App 2 Scheduler, Memory Cache Cache App 1 manager, file I/O system Disk App 3 Operating Devices Core Core DRAM System • Application Heartbeats express goals and current performance • System software can use Heartbeats to directly measure performance 7
Heartbeat API Functions Function Parameters Description heartbeat_initialize [int] window_size Initialize the heartbeat object to collect heartbeats. Uses a sliding window of window_size to calculate current hear trate heartbeat [int] tag Records a heartbeat with a given tag Returns the current heart rate averaged over the last hb_get_current_rate window_size heartbeats hb_set_target_rate [float] min, [float] max Sets the desired min and max heart rates for this app Returns the minimum desired heart rate hb_get_target_min_rate hb_get_target_max_rate Returns the maximum desired heart rate Sets the desired latency between heartbeats with tags hb_set_target_latency [float] min, [float] max, [int] tag1 and tag2 tag1, [int] tag2 Returns the minimum desired latency between two tags hb_get_min_latency [int] tag1, [int] tag2 hb_get_max_latency [int] tag1, [int] tag2 Returns the maximum desired latency between two tags hb_get_history [int] n Returns all heartbeat information for the last n heartbeats Heartbeat API allows direct communication of performance and goals 8
Heartbeats Reference Implementations http://groups.csail.mit.edu/carbon/heartbeats Callable from C/C++ • Files for distributed computing • Shared Memory for multicore Performance 1 Performance 2 • • – Throughput: ~0.900 Kbeat/s – Throughput: ~1500 Kbeat/s – Latency: ~1000 µ s – Latency: ~1.5 µ s 1. Intel Xeon servers @3.16 GHz with :Linux NFS 2. Intel Xeon servers @ 3.16 GHz with Linux and POSIX shared memory 9
Outline • Introduction/Motivation • Application Heartbeats • Experiments – Heartbeat use within an application – Heartbeat use by an external system – Other systems using Heartbeats • Conclusion 10
Experiment 1: Internal Heartbeat Usage • Experiment 1: Adaptive H.264 Encoder • Goal: produce the highest quality video in real-time • Method: – A heartbeat is registered for each frame (frame rate = heart rate) – Encoder reads heartbeat and changes algorithm to reach target • Results: – Now the encoder is fast and still high quality – Achieve target performance with barely visible quality loss 11
Example 1: Performance 35 reduced sub-pixel search 30 25 Heart Rate (Frames/s) Adaptive Encoder Target Heart Rate 20 reduced sub-pixel search 15 eliminated rate-distortion optimizations eliminated I4x4 mode in I-frames eliminated sub-16x16 modes in P-frames eliminated P8x8 mode in P-frames 10 eliminated I4x4 mode in P-frames esa search dia search 5 umh search 0 0 100 200 300 400 500 600 Time (Frame Number) 12
Example 1: Image Quality 0.8 0.6 0.4 PSNR Difference (dB) 0.2 0 -0.2 -0.4 -0.6 -0.8 -1 -1.2 0 100 200 300 400 500 600 Frame number 13
Example 2: External Heartbeat Usage • Experiment 2: External System Reads Heart Rate and Assigns Cores • Goal: Assign cores to keep performance within target range • Method: Use PARSEC benchmarks – Target heart rates set to be achievable using less than full number of cores • Results: – The scheduler keeps the applications running at the target speed – Scheduler can adapt to changes in the difficulty of the inputs 14
Example 2: bodytrack 4.5 9 4 8 3.5 7 Heart Rate (beat/s) 3 6 Cores 2.5 5 2 4 Heartrate 1.5 3 Target Min 1 2 Target Max 0.5 1 Cores 0 0 0 50 100 150 200 250 Time (Heartbeat) 15
Example 2: streammcluster 0.8 8 0.7 7 Heart Rate (beat/s) 0.6 6 0.5 5 Cores 0.4 4 0.3 3 Heartrate 0.2 2 Target Min Target Max 1 0.1 Cores 0 0 0 20 40 60 80 Time (Heartbeat) 16
Example 2: x264 50 10 45 9 40 8 Heart Rate (beat/s) 35 7 30 6 Cores 25 5 20 4 Heart Rate 15 3 Target Min 10 2 Target Max Cores 5 1 0 0 0 200 400 600 Time (Heartbeat) 17
Other Heartbeat Uses • SpeedPress compiler and SpeedGuard runtime system – The SpeedPress compiler discovers possible quality-of-service/ performance tradeoffs • Achieve up to 2x speedup for 5% QoS loss – The SpeedGuard runtime makes these tradeoffs dynamically in response to maintain a given heart rate in the face of environmental changes More detail available in: Hoffmann, Misailovic, Sidiroglou, Agarwal, Rinard. Using Code Perforation to Improve Performance, Reduce Energy Consumption, and Respond to Failures. MIT-CSAIL-TR-2209-042. August, 2009. • SmartLocks – Subject of an upcoming SMART talk 18
Outline • Introduction/Motivation • Application Heartbeats • Experiments • Conclusion – Request for feedback/usage – Summary 19
Request for Feedback • Thanks to the reviewers for their feedback, but we need more… • Heartbeat code is available online http://groups.csail.mit.edu/carbon/heartbeats • We need your feedback! – If you have an self-tuning system service that could benefit from being able to directly measure an application’s performance try the interface – Let us know what you think 20
Summary • Presented the Application Heartbeat interface – API provides a standard means for an application to make its performance and goals known • Presented several experiments showing basic usage – Several other systems at MIT are using Heartbeats in more advanced applications • Requested feedback from the community 21
Adaptive Scheduling Algorithm • Take average heart rate over last 20 beats • If heartbeat < target min – Add a core – Wait for 20 beats and reapeat • Else if heartbeat > target max – Remove a core – Wait for 20 beats and repeat • Else – Repeat 22
Recommend
More recommend