self monitoring and assumptions self adapting systems
play

Self-Monitoring and Assumptions Self-Adapting Systems Performance - PowerPoint PPT Presentation

Self-Monitoring and Assumptions Self-Adapting Systems Performance is important. People do not really know how to tune applications and systems. V E R I It would be nice to get some help from the system in tuning. T A S


  1. What do we do with Data? Off-line Analysis • Off-line Analysis • Use data from measurement thread to construct time series usage profile. • Monitors long-term behavior. • Conduct variance analysis. • Identifies common usage profiles. • Detects uncommon usage. • Construct predicted usage profiles. • Suggests thresholds to online system. • Determine resource thresholds from • Conducts feasibility evaluations. predicted profiles. • Online Analysis • Notify online system of thresholds. • Monitor instantaneous resource utilization. • Evaluate traces and logs; derive new • Maintain efficiency statistics. algorithms. • Detect dangerous conditions. • Simulate new algorithms, in situ. Self-Monitoring and Self-Adaptation Self-Monitoring and Self-Adaptation Online Analysis Adaptation Heuristics • Receive threshold and variance • Goal: decrease application latency. information from off-line system. • Paging • Maintain dynamic statistics about: • Collect page access trace. • Cache hit rates. • Look for well-known patterns (linear, cyclic, strided). • Lock contention. • Look for page access correlation. • Disk queue lengths. • Install better prefetching algorithm. • Load averages. • Disk Wait • Context switch rates. • Similar process to paging. • Detect abnormal behavior. • Replace read-ahead for the application(s). • Dynamically trigger trace generation. • CPU Hogs • Examine profile output. • Trigger adaptation heuristics. • Recompile kernel modules in application context. Self-Monitoring and Self-Adaptation Self-Monitoring and Self-Adaptation

  2. Adaptation (continued) Conclusions • Interrupt Latency • Self-monitoring is a generally useful idea. • Measure latency between interrupt arrival and delivery to process/thread. • An extensible system just makes it • Look for excessively long intervals or high variance. easier. • Check (fix) scheduling priorities. • Automatic adaptation is a cool idea. • Lock Contention • Challenging to do it correctly. • Measure lock wait times. • An extensible system makes it easier to • Decrease lock granularity on highly contested items. experiment with this. Self-Monitoring and Self-Adaptation Self-Monitoring and Self-Adaptation

  3. Self-Monitoring and Assumptions Self-Adapting Systems • Performance is important. • People do not really know how to tune applications and systems. V E R I • It would be nice to get some help from the system in tuning. T A S • Self-monitoring systems gather information about their own Margo I. Seltzer performance. Harvard University Division of Engineering and Applied Sciences Self-Monitoring and Self-Adaptation Outline Self-Monitoring in VINO • Self-Monitoring in VINO. • Measurement thread periodically collects module statistics. • Processing monitor data. • Generate detailed profiling information. • Adapting to system behavior. • Capture module inputs ( traces ) and • Conclusions. outputs ( logs ). • In-situ simulation evaluates competing algorithms and policies. Self-Monitoring and Self-Adaptation Self-Monitoring and Self-Adaptation

  4. Measurement Thread Generating Traces and Logs incoming requests measurement output data thread (graft) get_stats graft points Buffer Cache other other file lock txn systems other systems other system system system systems systems outputs VINO kernel Self-Monitoring and Self-Adaptation Self-Monitoring and Self-Adaptation Generating Traces and Logs In-Situ Simulation incoming requests pass parameters incoming requests to simulator and record real parameters pass- through Buffer Cache Buffer Cache Buffer Cache Simulator outputs simulation results outputs Self-Monitoring and Self-Adaptation Self-Monitoring and Self-Adaptation

  5. What do we do with Data? Off-line Analysis • Off-line Analysis • Use data from measurement thread to construct time series usage profile. • Monitors long-term behavior. • Conduct variance analysis. • Identifies common usage profiles. • Detects uncommon usage. • Construct predicted usage profiles. • Suggests thresholds to online system. • Determine resource thresholds from • Conducts feasibility evaluations. predicted profiles. • Online Analysis • Notify online system of thresholds. • Monitor instantaneous resource utilization. • Evaluate traces and logs; derive new • Maintain efficiency statistics. algorithms. • Detect dangerous conditions. • Simulate new algorithms, in situ. Self-Monitoring and Self-Adaptation Self-Monitoring and Self-Adaptation Online Analysis Adaptation Heuristics • Receive threshold and variance • Goal: decrease application latency. information from off-line system. • Paging • Maintain dynamic statistics about: • Collect page access trace. • Cache hit rates. • Look for well-known patterns (linear, cyclic, strided). • Lock contention. • Look for page access correlation. • Disk queue lengths. • Install better prefetching algorithm. • Load averages. • Disk Wait • Context switch rates. • Similar process to paging. • Detect abnormal behavior. • Replace read-ahead for the application(s). • Dynamically trigger trace generation. • CPU Hogs • Examine profile output. • Trigger adaptation heuristics. • Recompile kernel modules in application context. Self-Monitoring and Self-Adaptation Self-Monitoring and Self-Adaptation

  6. Adaptation (continued) Conclusions • Interrupt Latency • Self-monitoring is a generally useful idea. • Measure latency between interrupt arrival and delivery to process/thread. • An extensible system just makes it • Look for excessively long intervals or high variance. easier. • Check (fix) scheduling priorities. • Automatic adaptation is a cool idea. • Lock Contention • Challenging to do it correctly. • Measure lock wait times. • An extensible system makes it easier to • Decrease lock granularity on highly contested items. experiment with this. Self-Monitoring and Self-Adaptation Self-Monitoring and Self-Adaptation

  7. Self-Monitoring and Assumptions Self-Adapting Systems • Performance is important. • People do not really know how to tune applications and systems. V E R I • It would be nice to get some help from the system in tuning. T A S • Self-monitoring systems gather information about their own Margo I. Seltzer performance. Harvard University Division of Engineering and Applied Sciences Self-Monitoring and Self-Adaptation Outline Self-Monitoring in VINO • Self-Monitoring in VINO. • Measurement thread periodically collects module statistics. • Processing monitor data. • Generate detailed profiling information. • Adapting to system behavior. • Capture module inputs ( traces ) and • Conclusions. outputs ( logs ). • In-situ simulation evaluates competing algorithms and policies. Self-Monitoring and Self-Adaptation Self-Monitoring and Self-Adaptation

  8. Measurement Thread Generating Traces and Logs incoming requests measurement output data thread (graft) get_stats graft points Buffer Cache other other file lock txn systems other systems other system system system systems systems outputs VINO kernel Self-Monitoring and Self-Adaptation Self-Monitoring and Self-Adaptation Generating Traces and Logs In-Situ Simulation incoming requests pass parameters incoming requests to simulator and record real parameters pass- through Buffer Cache Buffer Cache Buffer Cache Simulator outputs simulation results outputs Self-Monitoring and Self-Adaptation Self-Monitoring and Self-Adaptation

  9. What do we do with Data? Off-line Analysis • Off-line Analysis • Use data from measurement thread to construct time series usage profile. • Monitors long-term behavior. • Conduct variance analysis. • Identifies common usage profiles. • Detects uncommon usage. • Construct predicted usage profiles. • Suggests thresholds to online system. • Determine resource thresholds from • Conducts feasibility evaluations. predicted profiles. • Online Analysis • Notify online system of thresholds. • Monitor instantaneous resource utilization. • Evaluate traces and logs; derive new • Maintain efficiency statistics. algorithms. • Detect dangerous conditions. • Simulate new algorithms, in situ. Self-Monitoring and Self-Adaptation Self-Monitoring and Self-Adaptation Online Analysis Adaptation Heuristics • Receive threshold and variance • Goal: decrease application latency. information from off-line system. • Paging • Maintain dynamic statistics about: • Collect page access trace. • Cache hit rates. • Look for well-known patterns (linear, cyclic, strided). • Lock contention. • Look for page access correlation. • Disk queue lengths. • Install better prefetching algorithm. • Load averages. • Disk Wait • Context switch rates. • Similar process to paging. • Detect abnormal behavior. • Replace read-ahead for the application(s). • Dynamically trigger trace generation. • CPU Hogs • Examine profile output. • Trigger adaptation heuristics. • Recompile kernel modules in application context. Self-Monitoring and Self-Adaptation Self-Monitoring and Self-Adaptation

  10. Adaptation (continued) Conclusions • Interrupt Latency • Self-monitoring is a generally useful idea. • Measure latency between interrupt arrival and delivery to process/thread. • An extensible system just makes it • Look for excessively long intervals or high variance. easier. • Check (fix) scheduling priorities. • Automatic adaptation is a cool idea. • Lock Contention • Challenging to do it correctly. • Measure lock wait times. • An extensible system makes it easier to • Decrease lock granularity on highly contested items. experiment with this. Self-Monitoring and Self-Adaptation Self-Monitoring and Self-Adaptation

  11. Self-Monitoring and Assumptions Self-Adapting Systems • Performance is important. • People do not really know how to tune applications and systems. V E R I • It would be nice to get some help from the system in tuning. T A S • Self-monitoring systems gather information about their own Margo I. Seltzer performance. Harvard University Division of Engineering and Applied Sciences Self-Monitoring and Self-Adaptation Outline Self-Monitoring in VINO • Self-Monitoring in VINO. • Measurement thread periodically collects module statistics. • Processing monitor data. • Generate detailed profiling information. • Adapting to system behavior. • Capture module inputs ( traces ) and • Conclusions. outputs ( logs ). • In-situ simulation evaluates competing algorithms and policies. Self-Monitoring and Self-Adaptation Self-Monitoring and Self-Adaptation

  12. Measurement Thread Generating Traces and Logs incoming requests measurement output data thread (graft) get_stats graft points Buffer Cache other other file lock txn systems other systems other system system system systems systems outputs VINO kernel Self-Monitoring and Self-Adaptation Self-Monitoring and Self-Adaptation Generating Traces and Logs In-Situ Simulation incoming requests pass parameters incoming requests to simulator and record real parameters pass- through Buffer Cache Buffer Cache Buffer Cache Simulator outputs simulation results outputs Self-Monitoring and Self-Adaptation Self-Monitoring and Self-Adaptation

  13. What do we do with Data? Off-line Analysis • Off-line Analysis • Use data from measurement thread to construct time series usage profile. • Monitors long-term behavior. • Conduct variance analysis. • Identifies common usage profiles. • Detects uncommon usage. • Construct predicted usage profiles. • Suggests thresholds to online system. • Determine resource thresholds from • Conducts feasibility evaluations. predicted profiles. • Online Analysis • Notify online system of thresholds. • Monitor instantaneous resource utilization. • Evaluate traces and logs; derive new • Maintain efficiency statistics. algorithms. • Detect dangerous conditions. • Simulate new algorithms, in situ. Self-Monitoring and Self-Adaptation Self-Monitoring and Self-Adaptation Online Analysis Adaptation Heuristics • Receive threshold and variance • Goal: decrease application latency. information from off-line system. • Paging • Maintain dynamic statistics about: • Collect page access trace. • Cache hit rates. • Look for well-known patterns (linear, cyclic, strided). • Lock contention. • Look for page access correlation. • Disk queue lengths. • Install better prefetching algorithm. • Load averages. • Disk Wait • Context switch rates. • Similar process to paging. • Detect abnormal behavior. • Replace read-ahead for the application(s). • Dynamically trigger trace generation. • CPU Hogs • Examine profile output. • Trigger adaptation heuristics. • Recompile kernel modules in application context. Self-Monitoring and Self-Adaptation Self-Monitoring and Self-Adaptation

  14. Adaptation (continued) Conclusions • Interrupt Latency • Self-monitoring is a generally useful idea. • Measure latency between interrupt arrival and delivery to process/thread. • An extensible system just makes it • Look for excessively long intervals or high variance. easier. • Check (fix) scheduling priorities. • Automatic adaptation is a cool idea. • Lock Contention • Challenging to do it correctly. • Measure lock wait times. • An extensible system makes it easier to • Decrease lock granularity on highly contested items. experiment with this. Self-Monitoring and Self-Adaptation Self-Monitoring and Self-Adaptation

  15. Self-Monitoring and Assumptions Self-Adapting Systems • Performance is important. • People do not really know how to tune applications and systems. V E R I • It would be nice to get some help from the system in tuning. T A S • Self-monitoring systems gather information about their own Margo I. Seltzer performance. Harvard University Division of Engineering and Applied Sciences Self-Monitoring and Self-Adaptation Outline Self-Monitoring in VINO • Self-Monitoring in VINO. • Measurement thread periodically collects module statistics. • Processing monitor data. • Generate detailed profiling information. • Adapting to system behavior. • Capture module inputs ( traces ) and • Conclusions. outputs ( logs ). • In-situ simulation evaluates competing algorithms and policies. Self-Monitoring and Self-Adaptation Self-Monitoring and Self-Adaptation

  16. Measurement Thread Generating Traces and Logs incoming requests measurement output data thread (graft) get_stats graft points Buffer Cache other other file lock txn systems other systems other system system system systems systems outputs VINO kernel Self-Monitoring and Self-Adaptation Self-Monitoring and Self-Adaptation Generating Traces and Logs In-Situ Simulation incoming requests pass parameters incoming requests to simulator and record real parameters pass- through Buffer Cache Buffer Cache Buffer Cache Simulator outputs simulation results outputs Self-Monitoring and Self-Adaptation Self-Monitoring and Self-Adaptation

  17. What do we do with Data? Off-line Analysis • Off-line Analysis • Use data from measurement thread to construct time series usage profile. • Monitors long-term behavior. • Conduct variance analysis. • Identifies common usage profiles. • Detects uncommon usage. • Construct predicted usage profiles. • Suggests thresholds to online system. • Determine resource thresholds from • Conducts feasibility evaluations. predicted profiles. • Online Analysis • Notify online system of thresholds. • Monitor instantaneous resource utilization. • Evaluate traces and logs; derive new • Maintain efficiency statistics. algorithms. • Detect dangerous conditions. • Simulate new algorithms, in situ. Self-Monitoring and Self-Adaptation Self-Monitoring and Self-Adaptation Online Analysis Adaptation Heuristics • Receive threshold and variance • Goal: decrease application latency. information from off-line system. • Paging • Maintain dynamic statistics about: • Collect page access trace. • Cache hit rates. • Look for well-known patterns (linear, cyclic, strided). • Lock contention. • Look for page access correlation. • Disk queue lengths. • Install better prefetching algorithm. • Load averages. • Disk Wait • Context switch rates. • Similar process to paging. • Detect abnormal behavior. • Replace read-ahead for the application(s). • Dynamically trigger trace generation. • CPU Hogs • Examine profile output. • Trigger adaptation heuristics. • Recompile kernel modules in application context. Self-Monitoring and Self-Adaptation Self-Monitoring and Self-Adaptation

  18. Adaptation (continued) Conclusions • Interrupt Latency • Self-monitoring is a generally useful idea. • Measure latency between interrupt arrival and delivery to process/thread. • An extensible system just makes it • Look for excessively long intervals or high variance. easier. • Check (fix) scheduling priorities. • Automatic adaptation is a cool idea. • Lock Contention • Challenging to do it correctly. • Measure lock wait times. • An extensible system makes it easier to • Decrease lock granularity on highly contested items. experiment with this. Self-Monitoring and Self-Adaptation Self-Monitoring and Self-Adaptation

  19. Self-Monitoring and Assumptions Self-Adapting Systems • Performance is important. • People do not really know how to tune applications and systems. V E R I • It would be nice to get some help from the system in tuning. T A S • Self-monitoring systems gather information about their own Margo I. Seltzer performance. Harvard University Division of Engineering and Applied Sciences Self-Monitoring and Self-Adaptation Outline Self-Monitoring in VINO • Self-Monitoring in VINO. • Measurement thread periodically collects module statistics. • Processing monitor data. • Generate detailed profiling information. • Adapting to system behavior. • Capture module inputs ( traces ) and • Conclusions. outputs ( logs ). • In-situ simulation evaluates competing algorithms and policies. Self-Monitoring and Self-Adaptation Self-Monitoring and Self-Adaptation

  20. Measurement Thread Generating Traces and Logs incoming requests measurement output data thread (graft) get_stats graft points Buffer Cache other other file lock txn systems other systems other system system system systems systems outputs VINO kernel Self-Monitoring and Self-Adaptation Self-Monitoring and Self-Adaptation Generating Traces and Logs In-Situ Simulation incoming requests pass parameters incoming requests to simulator and record real parameters pass- through Buffer Cache Buffer Cache Buffer Cache Simulator outputs simulation results outputs Self-Monitoring and Self-Adaptation Self-Monitoring and Self-Adaptation

  21. What do we do with Data? Off-line Analysis • Off-line Analysis • Use data from measurement thread to construct time series usage profile. • Monitors long-term behavior. • Conduct variance analysis. • Identifies common usage profiles. • Detects uncommon usage. • Construct predicted usage profiles. • Suggests thresholds to online system. • Determine resource thresholds from • Conducts feasibility evaluations. predicted profiles. • Online Analysis • Notify online system of thresholds. • Monitor instantaneous resource utilization. • Evaluate traces and logs; derive new • Maintain efficiency statistics. algorithms. • Detect dangerous conditions. • Simulate new algorithms, in situ. Self-Monitoring and Self-Adaptation Self-Monitoring and Self-Adaptation Online Analysis Adaptation Heuristics • Receive threshold and variance • Goal: decrease application latency. information from off-line system. • Paging • Maintain dynamic statistics about: • Collect page access trace. • Cache hit rates. • Look for well-known patterns (linear, cyclic, strided). • Lock contention. • Look for page access correlation. • Disk queue lengths. • Install better prefetching algorithm. • Load averages. • Disk Wait • Context switch rates. • Similar process to paging. • Detect abnormal behavior. • Replace read-ahead for the application(s). • Dynamically trigger trace generation. • CPU Hogs • Examine profile output. • Trigger adaptation heuristics. • Recompile kernel modules in application context. Self-Monitoring and Self-Adaptation Self-Monitoring and Self-Adaptation

  22. Adaptation (continued) Conclusions • Interrupt Latency • Self-monitoring is a generally useful idea. • Measure latency between interrupt arrival and delivery to process/thread. • An extensible system just makes it • Look for excessively long intervals or high variance. easier. • Check (fix) scheduling priorities. • Automatic adaptation is a cool idea. • Lock Contention • Challenging to do it correctly. • Measure lock wait times. • An extensible system makes it easier to • Decrease lock granularity on highly contested items. experiment with this. Self-Monitoring and Self-Adaptation Self-Monitoring and Self-Adaptation

  23. Self-Monitoring and Assumptions Self-Adapting Systems • Performance is important. • People do not really know how to tune applications and systems. V E R I • It would be nice to get some help from the system in tuning. T A S • Self-monitoring systems gather information about their own Margo I. Seltzer performance. Harvard University Division of Engineering and Applied Sciences Self-Monitoring and Self-Adaptation Outline Self-Monitoring in VINO • Self-Monitoring in VINO. • Measurement thread periodically collects module statistics. • Processing monitor data. • Generate detailed profiling information. • Adapting to system behavior. • Capture module inputs ( traces ) and • Conclusions. outputs ( logs ). • In-situ simulation evaluates competing algorithms and policies. Self-Monitoring and Self-Adaptation Self-Monitoring and Self-Adaptation

  24. Measurement Thread Generating Traces and Logs incoming requests measurement output data thread (graft) get_stats graft points Buffer Cache other other file lock txn systems other systems other system system system systems systems outputs VINO kernel Self-Monitoring and Self-Adaptation Self-Monitoring and Self-Adaptation Generating Traces and Logs In-Situ Simulation incoming requests pass parameters incoming requests to simulator and record real parameters pass- through Buffer Cache Buffer Cache Buffer Cache Simulator outputs simulation results outputs Self-Monitoring and Self-Adaptation Self-Monitoring and Self-Adaptation

  25. What do we do with Data? Off-line Analysis • Off-line Analysis • Use data from measurement thread to construct time series usage profile. • Monitors long-term behavior. • Conduct variance analysis. • Identifies common usage profiles. • Detects uncommon usage. • Construct predicted usage profiles. • Suggests thresholds to online system. • Determine resource thresholds from • Conducts feasibility evaluations. predicted profiles. • Online Analysis • Notify online system of thresholds. • Monitor instantaneous resource utilization. • Evaluate traces and logs; derive new • Maintain efficiency statistics. algorithms. • Detect dangerous conditions. • Simulate new algorithms, in situ. Self-Monitoring and Self-Adaptation Self-Monitoring and Self-Adaptation Online Analysis Adaptation Heuristics • Receive threshold and variance • Goal: decrease application latency. information from off-line system. • Paging • Maintain dynamic statistics about: • Collect page access trace. • Cache hit rates. • Look for well-known patterns (linear, cyclic, strided). • Lock contention. • Look for page access correlation. • Disk queue lengths. • Install better prefetching algorithm. • Load averages. • Disk Wait • Context switch rates. • Similar process to paging. • Detect abnormal behavior. • Replace read-ahead for the application(s). • Dynamically trigger trace generation. • CPU Hogs • Examine profile output. • Trigger adaptation heuristics. • Recompile kernel modules in application context. Self-Monitoring and Self-Adaptation Self-Monitoring and Self-Adaptation

  26. Adaptation (continued) Conclusions • Interrupt Latency • Self-monitoring is a generally useful idea. • Measure latency between interrupt arrival and delivery to process/thread. • An extensible system just makes it • Look for excessively long intervals or high variance. easier. • Check (fix) scheduling priorities. • Automatic adaptation is a cool idea. • Lock Contention • Challenging to do it correctly. • Measure lock wait times. • An extensible system makes it easier to • Decrease lock granularity on highly contested items. experiment with this. Self-Monitoring and Self-Adaptation Self-Monitoring and Self-Adaptation

  27. Self-Monitoring and Assumptions Self-Adapting Systems • Performance is important. • People do not really know how to tune applications and systems. V E R I • It would be nice to get some help from the system in tuning. T A S • Self-monitoring systems gather information about their own Margo I. Seltzer performance. Harvard University Division of Engineering and Applied Sciences Self-Monitoring and Self-Adaptation Outline Self-Monitoring in VINO • Self-Monitoring in VINO. • Measurement thread periodically collects module statistics. • Processing monitor data. • Generate detailed profiling information. • Adapting to system behavior. • Capture module inputs ( traces ) and • Conclusions. outputs ( logs ). • In-situ simulation evaluates competing algorithms and policies. Self-Monitoring and Self-Adaptation Self-Monitoring and Self-Adaptation

  28. Measurement Thread Generating Traces and Logs incoming requests measurement output data thread (graft) get_stats graft points Buffer Cache other other file lock txn systems other systems other system system system systems systems outputs VINO kernel Self-Monitoring and Self-Adaptation Self-Monitoring and Self-Adaptation Generating Traces and Logs In-Situ Simulation incoming requests pass parameters incoming requests to simulator and record real parameters pass- through Buffer Cache Buffer Cache Buffer Cache Simulator outputs simulation results outputs Self-Monitoring and Self-Adaptation Self-Monitoring and Self-Adaptation

  29. What do we do with Data? Off-line Analysis • Off-line Analysis • Use data from measurement thread to construct time series usage profile. • Monitors long-term behavior. • Conduct variance analysis. • Identifies common usage profiles. • Detects uncommon usage. • Construct predicted usage profiles. • Suggests thresholds to online system. • Determine resource thresholds from • Conducts feasibility evaluations. predicted profiles. • Online Analysis • Notify online system of thresholds. • Monitor instantaneous resource utilization. • Evaluate traces and logs; derive new • Maintain efficiency statistics. algorithms. • Detect dangerous conditions. • Simulate new algorithms, in situ. Self-Monitoring and Self-Adaptation Self-Monitoring and Self-Adaptation Online Analysis Adaptation Heuristics • Receive threshold and variance • Goal: decrease application latency. information from off-line system. • Paging • Maintain dynamic statistics about: • Collect page access trace. • Cache hit rates. • Look for well-known patterns (linear, cyclic, strided). • Lock contention. • Look for page access correlation. • Disk queue lengths. • Install better prefetching algorithm. • Load averages. • Disk Wait • Context switch rates. • Similar process to paging. • Detect abnormal behavior. • Replace read-ahead for the application(s). • Dynamically trigger trace generation. • CPU Hogs • Examine profile output. • Trigger adaptation heuristics. • Recompile kernel modules in application context. Self-Monitoring and Self-Adaptation Self-Monitoring and Self-Adaptation

  30. Adaptation (continued) Conclusions • Interrupt Latency • Self-monitoring is a generally useful idea. • Measure latency between interrupt arrival and delivery to process/thread. • An extensible system just makes it • Look for excessively long intervals or high variance. easier. • Check (fix) scheduling priorities. • Automatic adaptation is a cool idea. • Lock Contention • Challenging to do it correctly. • Measure lock wait times. • An extensible system makes it easier to • Decrease lock granularity on highly contested items. experiment with this. Self-Monitoring and Self-Adaptation Self-Monitoring and Self-Adaptation

  31. Self-Monitoring and Assumptions Self-Adapting Systems • Performance is important. • People do not really know how to tune applications and systems. V E R I • It would be nice to get some help from the system in tuning. T A S • Self-monitoring systems gather information about their own Margo I. Seltzer performance. Harvard University Division of Engineering and Applied Sciences Self-Monitoring and Self-Adaptation Outline Self-Monitoring in VINO • Self-Monitoring in VINO. • Measurement thread periodically collects module statistics. • Processing monitor data. • Generate detailed profiling information. • Adapting to system behavior. • Capture module inputs ( traces ) and • Conclusions. outputs ( logs ). • In-situ simulation evaluates competing algorithms and policies. Self-Monitoring and Self-Adaptation Self-Monitoring and Self-Adaptation

  32. Measurement Thread Generating Traces and Logs incoming requests measurement output data thread (graft) get_stats graft points Buffer Cache other other file lock txn systems other systems other system system system systems systems outputs VINO kernel Self-Monitoring and Self-Adaptation Self-Monitoring and Self-Adaptation Generating Traces and Logs In-Situ Simulation incoming requests pass parameters incoming requests to simulator and record real parameters pass- through Buffer Cache Buffer Cache Buffer Cache Simulator outputs simulation results outputs Self-Monitoring and Self-Adaptation Self-Monitoring and Self-Adaptation

  33. What do we do with Data? Off-line Analysis • Off-line Analysis • Use data from measurement thread to construct time series usage profile. • Monitors long-term behavior. • Conduct variance analysis. • Identifies common usage profiles. • Detects uncommon usage. • Construct predicted usage profiles. • Suggests thresholds to online system. • Determine resource thresholds from • Conducts feasibility evaluations. predicted profiles. • Online Analysis • Notify online system of thresholds. • Monitor instantaneous resource utilization. • Evaluate traces and logs; derive new • Maintain efficiency statistics. algorithms. • Detect dangerous conditions. • Simulate new algorithms, in situ. Self-Monitoring and Self-Adaptation Self-Monitoring and Self-Adaptation Online Analysis Adaptation Heuristics • Receive threshold and variance • Goal: decrease application latency. information from off-line system. • Paging • Maintain dynamic statistics about: • Collect page access trace. • Cache hit rates. • Look for well-known patterns (linear, cyclic, strided). • Lock contention. • Look for page access correlation. • Disk queue lengths. • Install better prefetching algorithm. • Load averages. • Disk Wait • Context switch rates. • Similar process to paging. • Detect abnormal behavior. • Replace read-ahead for the application(s). • Dynamically trigger trace generation. • CPU Hogs • Examine profile output. • Trigger adaptation heuristics. • Recompile kernel modules in application context. Self-Monitoring and Self-Adaptation Self-Monitoring and Self-Adaptation

  34. Adaptation (continued) Conclusions • Interrupt Latency • Self-monitoring is a generally useful idea. • Measure latency between interrupt arrival and delivery to process/thread. • An extensible system just makes it • Look for excessively long intervals or high variance. easier. • Check (fix) scheduling priorities. • Automatic adaptation is a cool idea. • Lock Contention • Challenging to do it correctly. • Measure lock wait times. • An extensible system makes it easier to • Decrease lock granularity on highly contested items. experiment with this. Self-Monitoring and Self-Adaptation Self-Monitoring and Self-Adaptation

  35. Self-Monitoring and Assumptions Self-Adapting Systems • Performance is important. • People do not really know how to tune applications and systems. V E R I • It would be nice to get some help from the system in tuning. T A S • Self-monitoring systems gather information about their own Margo I. Seltzer performance. Harvard University Division of Engineering and Applied Sciences Self-Monitoring and Self-Adaptation Outline Self-Monitoring in VINO • Self-Monitoring in VINO. • Measurement thread periodically collects module statistics. • Processing monitor data. • Generate detailed profiling information. • Adapting to system behavior. • Capture module inputs ( traces ) and • Conclusions. outputs ( logs ). • In-situ simulation evaluates competing algorithms and policies. Self-Monitoring and Self-Adaptation Self-Monitoring and Self-Adaptation

  36. Measurement Thread Generating Traces and Logs incoming requests measurement output data thread (graft) get_stats graft points Buffer Cache other other file lock txn systems other systems other system system system systems systems outputs VINO kernel Self-Monitoring and Self-Adaptation Self-Monitoring and Self-Adaptation Generating Traces and Logs In-Situ Simulation incoming requests pass parameters incoming requests to simulator and record real parameters pass- through Buffer Cache Buffer Cache Buffer Cache Simulator outputs simulation results outputs Self-Monitoring and Self-Adaptation Self-Monitoring and Self-Adaptation

  37. What do we do with Data? Off-line Analysis • Off-line Analysis • Use data from measurement thread to construct time series usage profile. • Monitors long-term behavior. • Conduct variance analysis. • Identifies common usage profiles. • Detects uncommon usage. • Construct predicted usage profiles. • Suggests thresholds to online system. • Determine resource thresholds from • Conducts feasibility evaluations. predicted profiles. • Online Analysis • Notify online system of thresholds. • Monitor instantaneous resource utilization. • Evaluate traces and logs; derive new • Maintain efficiency statistics. algorithms. • Detect dangerous conditions. • Simulate new algorithms, in situ. Self-Monitoring and Self-Adaptation Self-Monitoring and Self-Adaptation Online Analysis Adaptation Heuristics • Receive threshold and variance • Goal: decrease application latency. information from off-line system. • Paging • Maintain dynamic statistics about: • Collect page access trace. • Cache hit rates. • Look for well-known patterns (linear, cyclic, strided). • Lock contention. • Look for page access correlation. • Disk queue lengths. • Install better prefetching algorithm. • Load averages. • Disk Wait • Context switch rates. • Similar process to paging. • Detect abnormal behavior. • Replace read-ahead for the application(s). • Dynamically trigger trace generation. • CPU Hogs • Examine profile output. • Trigger adaptation heuristics. • Recompile kernel modules in application context. Self-Monitoring and Self-Adaptation Self-Monitoring and Self-Adaptation

  38. Adaptation (continued) Conclusions • Interrupt Latency • Self-monitoring is a generally useful idea. • Measure latency between interrupt arrival and delivery to process/thread. • An extensible system just makes it • Look for excessively long intervals or high variance. easier. • Check (fix) scheduling priorities. • Automatic adaptation is a cool idea. • Lock Contention • Challenging to do it correctly. • Measure lock wait times. • An extensible system makes it easier to • Decrease lock granularity on highly contested items. experiment with this. Self-Monitoring and Self-Adaptation Self-Monitoring and Self-Adaptation

  39. Self-Monitoring and Assumptions Self-Adapting Systems • Performance is important. • People do not really know how to tune applications and systems. V E R I • It would be nice to get some help from the system in tuning. T A S • Self-monitoring systems gather information about their own Margo I. Seltzer performance. Harvard University Division of Engineering and Applied Sciences Self-Monitoring and Self-Adaptation Outline Self-Monitoring in VINO • Self-Monitoring in VINO. • Measurement thread periodically collects module statistics. • Processing monitor data. • Generate detailed profiling information. • Adapting to system behavior. • Capture module inputs ( traces ) and • Conclusions. outputs ( logs ). • In-situ simulation evaluates competing algorithms and policies. Self-Monitoring and Self-Adaptation Self-Monitoring and Self-Adaptation

  40. Measurement Thread Generating Traces and Logs incoming requests measurement output data thread (graft) get_stats graft points Buffer Cache other other file lock txn systems other systems other system system system systems systems outputs VINO kernel Self-Monitoring and Self-Adaptation Self-Monitoring and Self-Adaptation Generating Traces and Logs In-Situ Simulation incoming requests pass parameters incoming requests to simulator and record real parameters pass- through Buffer Cache Buffer Cache Buffer Cache Simulator outputs simulation results outputs Self-Monitoring and Self-Adaptation Self-Monitoring and Self-Adaptation

  41. What do we do with Data? Off-line Analysis • Off-line Analysis • Use data from measurement thread to construct time series usage profile. • Monitors long-term behavior. • Conduct variance analysis. • Identifies common usage profiles. • Detects uncommon usage. • Construct predicted usage profiles. • Suggests thresholds to online system. • Determine resource thresholds from • Conducts feasibility evaluations. predicted profiles. • Online Analysis • Notify online system of thresholds. • Monitor instantaneous resource utilization. • Evaluate traces and logs; derive new • Maintain efficiency statistics. algorithms. • Detect dangerous conditions. • Simulate new algorithms, in situ. Self-Monitoring and Self-Adaptation Self-Monitoring and Self-Adaptation Online Analysis Adaptation Heuristics • Receive threshold and variance • Goal: decrease application latency. information from off-line system. • Paging • Maintain dynamic statistics about: • Collect page access trace. • Cache hit rates. • Look for well-known patterns (linear, cyclic, strided). • Lock contention. • Look for page access correlation. • Disk queue lengths. • Install better prefetching algorithm. • Load averages. • Disk Wait • Context switch rates. • Similar process to paging. • Detect abnormal behavior. • Replace read-ahead for the application(s). • Dynamically trigger trace generation. • CPU Hogs • Examine profile output. • Trigger adaptation heuristics. • Recompile kernel modules in application context. Self-Monitoring and Self-Adaptation Self-Monitoring and Self-Adaptation

  42. Adaptation (continued) Conclusions • Interrupt Latency • Self-monitoring is a generally useful idea. • Measure latency between interrupt arrival and delivery to process/thread. • An extensible system just makes it • Look for excessively long intervals or high variance. easier. • Check (fix) scheduling priorities. • Automatic adaptation is a cool idea. • Lock Contention • Challenging to do it correctly. • Measure lock wait times. • An extensible system makes it easier to • Decrease lock granularity on highly contested items. experiment with this. Self-Monitoring and Self-Adaptation Self-Monitoring and Self-Adaptation

  43. Self-Monitoring and Assumptions Self-Adapting Systems • Performance is important. • People do not really know how to tune applications and systems. V E R I • It would be nice to get some help from the system in tuning. T A S • Self-monitoring systems gather information about their own Margo I. Seltzer performance. Harvard University Division of Engineering and Applied Sciences Self-Monitoring and Self-Adaptation Outline Self-Monitoring in VINO • Self-Monitoring in VINO. • Measurement thread periodically collects module statistics. • Processing monitor data. • Generate detailed profiling information. • Adapting to system behavior. • Capture module inputs ( traces ) and • Conclusions. outputs ( logs ). • In-situ simulation evaluates competing algorithms and policies. Self-Monitoring and Self-Adaptation Self-Monitoring and Self-Adaptation

  44. Measurement Thread Generating Traces and Logs incoming requests measurement output data thread (graft) get_stats graft points Buffer Cache other other file lock txn systems other systems other system system system systems systems outputs VINO kernel Self-Monitoring and Self-Adaptation Self-Monitoring and Self-Adaptation Generating Traces and Logs In-Situ Simulation incoming requests pass parameters incoming requests to simulator and record real parameters pass- through Buffer Cache Buffer Cache Buffer Cache Simulator outputs simulation results outputs Self-Monitoring and Self-Adaptation Self-Monitoring and Self-Adaptation

  45. What do we do with Data? Off-line Analysis • Off-line Analysis • Use data from measurement thread to construct time series usage profile. • Monitors long-term behavior. • Conduct variance analysis. • Identifies common usage profiles. • Detects uncommon usage. • Construct predicted usage profiles. • Suggests thresholds to online system. • Determine resource thresholds from • Conducts feasibility evaluations. predicted profiles. • Online Analysis • Notify online system of thresholds. • Monitor instantaneous resource utilization. • Evaluate traces and logs; derive new • Maintain efficiency statistics. algorithms. • Detect dangerous conditions. • Simulate new algorithms, in situ. Self-Monitoring and Self-Adaptation Self-Monitoring and Self-Adaptation Online Analysis Adaptation Heuristics • Receive threshold and variance • Goal: decrease application latency. information from off-line system. • Paging • Maintain dynamic statistics about: • Collect page access trace. • Cache hit rates. • Look for well-known patterns (linear, cyclic, strided). • Lock contention. • Look for page access correlation. • Disk queue lengths. • Install better prefetching algorithm. • Load averages. • Disk Wait • Context switch rates. • Similar process to paging. • Detect abnormal behavior. • Replace read-ahead for the application(s). • Dynamically trigger trace generation. • CPU Hogs • Examine profile output. • Trigger adaptation heuristics. • Recompile kernel modules in application context. Self-Monitoring and Self-Adaptation Self-Monitoring and Self-Adaptation

  46. Adaptation (continued) Conclusions • Interrupt Latency • Self-monitoring is a generally useful idea. • Measure latency between interrupt arrival and delivery to process/thread. • An extensible system just makes it • Look for excessively long intervals or high variance. easier. • Check (fix) scheduling priorities. • Automatic adaptation is a cool idea. • Lock Contention • Challenging to do it correctly. • Measure lock wait times. • An extensible system makes it easier to • Decrease lock granularity on highly contested items. experiment with this. Self-Monitoring and Self-Adaptation Self-Monitoring and Self-Adaptation

  47. Self-Monitoring and Assumptions Self-Adapting Systems • Performance is important. • People do not really know how to tune applications and systems. V E R I • It would be nice to get some help from the system in tuning. T A S • Self-monitoring systems gather information about their own Margo I. Seltzer performance. Harvard University Division of Engineering and Applied Sciences Self-Monitoring and Self-Adaptation Outline Self-Monitoring in VINO • Self-Monitoring in VINO. • Measurement thread periodically collects module statistics. • Processing monitor data. • Generate detailed profiling information. • Adapting to system behavior. • Capture module inputs ( traces ) and • Conclusions. outputs ( logs ). • In-situ simulation evaluates competing algorithms and policies. Self-Monitoring and Self-Adaptation Self-Monitoring and Self-Adaptation

  48. Measurement Thread Generating Traces and Logs incoming requests measurement output data thread (graft) get_stats graft points Buffer Cache other other file lock txn systems other systems other system system system systems systems outputs VINO kernel Self-Monitoring and Self-Adaptation Self-Monitoring and Self-Adaptation Generating Traces and Logs In-Situ Simulation incoming requests pass parameters incoming requests to simulator and record real parameters pass- through Buffer Cache Buffer Cache Buffer Cache Simulator outputs simulation results outputs Self-Monitoring and Self-Adaptation Self-Monitoring and Self-Adaptation

  49. What do we do with Data? Off-line Analysis • Off-line Analysis • Use data from measurement thread to construct time series usage profile. • Monitors long-term behavior. • Conduct variance analysis. • Identifies common usage profiles. • Detects uncommon usage. • Construct predicted usage profiles. • Suggests thresholds to online system. • Determine resource thresholds from • Conducts feasibility evaluations. predicted profiles. • Online Analysis • Notify online system of thresholds. • Monitor instantaneous resource utilization. • Evaluate traces and logs; derive new • Maintain efficiency statistics. algorithms. • Detect dangerous conditions. • Simulate new algorithms, in situ. Self-Monitoring and Self-Adaptation Self-Monitoring and Self-Adaptation Online Analysis Adaptation Heuristics • Receive threshold and variance • Goal: decrease application latency. information from off-line system. • Paging • Maintain dynamic statistics about: • Collect page access trace. • Cache hit rates. • Look for well-known patterns (linear, cyclic, strided). • Lock contention. • Look for page access correlation. • Disk queue lengths. • Install better prefetching algorithm. • Load averages. • Disk Wait • Context switch rates. • Similar process to paging. • Detect abnormal behavior. • Replace read-ahead for the application(s). • Dynamically trigger trace generation. • CPU Hogs • Examine profile output. • Trigger adaptation heuristics. • Recompile kernel modules in application context. Self-Monitoring and Self-Adaptation Self-Monitoring and Self-Adaptation

  50. Adaptation (continued) Conclusions • Interrupt Latency • Self-monitoring is a generally useful idea. • Measure latency between interrupt arrival and delivery to process/thread. • An extensible system just makes it • Look for excessively long intervals or high variance. easier. • Check (fix) scheduling priorities. • Automatic adaptation is a cool idea. • Lock Contention • Challenging to do it correctly. • Measure lock wait times. • An extensible system makes it easier to • Decrease lock granularity on highly contested items. experiment with this. Self-Monitoring and Self-Adaptation Self-Monitoring and Self-Adaptation

  51. Self-Monitoring and Assumptions Self-Adapting Systems • Performance is important. • People do not really know how to tune applications and systems. V E R I • It would be nice to get some help from the system in tuning. T A S • Self-monitoring systems gather information about their own Margo I. Seltzer performance. Harvard University Division of Engineering and Applied Sciences Self-Monitoring and Self-Adaptation Outline Self-Monitoring in VINO • Self-Monitoring in VINO. • Measurement thread periodically collects module statistics. • Processing monitor data. • Generate detailed profiling information. • Adapting to system behavior. • Capture module inputs ( traces ) and • Conclusions. outputs ( logs ). • In-situ simulation evaluates competing algorithms and policies. Self-Monitoring and Self-Adaptation Self-Monitoring and Self-Adaptation

  52. Measurement Thread Generating Traces and Logs incoming requests measurement output data thread (graft) get_stats graft points Buffer Cache other other file lock txn systems other systems other system system system systems systems outputs VINO kernel Self-Monitoring and Self-Adaptation Self-Monitoring and Self-Adaptation Generating Traces and Logs In-Situ Simulation incoming requests pass parameters incoming requests to simulator and record real parameters pass- through Buffer Cache Buffer Cache Buffer Cache Simulator outputs simulation results outputs Self-Monitoring and Self-Adaptation Self-Monitoring and Self-Adaptation

  53. What do we do with Data? Off-line Analysis • Off-line Analysis • Use data from measurement thread to construct time series usage profile. • Monitors long-term behavior. • Conduct variance analysis. • Identifies common usage profiles. • Detects uncommon usage. • Construct predicted usage profiles. • Suggests thresholds to online system. • Determine resource thresholds from • Conducts feasibility evaluations. predicted profiles. • Online Analysis • Notify online system of thresholds. • Monitor instantaneous resource utilization. • Evaluate traces and logs; derive new • Maintain efficiency statistics. algorithms. • Detect dangerous conditions. • Simulate new algorithms, in situ. Self-Monitoring and Self-Adaptation Self-Monitoring and Self-Adaptation Online Analysis Adaptation Heuristics • Receive threshold and variance • Goal: decrease application latency. information from off-line system. • Paging • Maintain dynamic statistics about: • Collect page access trace. • Cache hit rates. • Look for well-known patterns (linear, cyclic, strided). • Lock contention. • Look for page access correlation. • Disk queue lengths. • Install better prefetching algorithm. • Load averages. • Disk Wait • Context switch rates. • Similar process to paging. • Detect abnormal behavior. • Replace read-ahead for the application(s). • Dynamically trigger trace generation. • CPU Hogs • Examine profile output. • Trigger adaptation heuristics. • Recompile kernel modules in application context. Self-Monitoring and Self-Adaptation Self-Monitoring and Self-Adaptation

  54. Adaptation (continued) Conclusions • Interrupt Latency • Self-monitoring is a generally useful idea. • Measure latency between interrupt arrival and delivery to process/thread. • An extensible system just makes it • Look for excessively long intervals or high variance. easier. • Check (fix) scheduling priorities. • Automatic adaptation is a cool idea. • Lock Contention • Challenging to do it correctly. • Measure lock wait times. • An extensible system makes it easier to • Decrease lock granularity on highly contested items. experiment with this. Self-Monitoring and Self-Adaptation Self-Monitoring and Self-Adaptation

  55. Self-Monitoring and Assumptions Self-Adapting Systems • Performance is important. • People do not really know how to tune applications and systems. V E R I • It would be nice to get some help from the system in tuning. T A S • Self-monitoring systems gather information about their own Margo I. Seltzer performance. Harvard University Division of Engineering and Applied Sciences Self-Monitoring and Self-Adaptation Outline Self-Monitoring in VINO • Self-Monitoring in VINO. • Measurement thread periodically collects module statistics. • Processing monitor data. • Generate detailed profiling information. • Adapting to system behavior. • Capture module inputs ( traces ) and • Conclusions. outputs ( logs ). • In-situ simulation evaluates competing algorithms and policies. Self-Monitoring and Self-Adaptation Self-Monitoring and Self-Adaptation

  56. Measurement Thread Generating Traces and Logs incoming requests measurement output data thread (graft) get_stats graft points Buffer Cache other other file lock txn systems other systems other system system system systems systems outputs VINO kernel Self-Monitoring and Self-Adaptation Self-Monitoring and Self-Adaptation Generating Traces and Logs In-Situ Simulation incoming requests pass parameters incoming requests to simulator and record real parameters pass- through Buffer Cache Buffer Cache Buffer Cache Simulator outputs simulation results outputs Self-Monitoring and Self-Adaptation Self-Monitoring and Self-Adaptation

  57. What do we do with Data? Off-line Analysis • Off-line Analysis • Use data from measurement thread to construct time series usage profile. • Monitors long-term behavior. • Conduct variance analysis. • Identifies common usage profiles. • Detects uncommon usage. • Construct predicted usage profiles. • Suggests thresholds to online system. • Determine resource thresholds from • Conducts feasibility evaluations. predicted profiles. • Online Analysis • Notify online system of thresholds. • Monitor instantaneous resource utilization. • Evaluate traces and logs; derive new • Maintain efficiency statistics. algorithms. • Detect dangerous conditions. • Simulate new algorithms, in situ. Self-Monitoring and Self-Adaptation Self-Monitoring and Self-Adaptation Online Analysis Adaptation Heuristics • Receive threshold and variance • Goal: decrease application latency. information from off-line system. • Paging • Maintain dynamic statistics about: • Collect page access trace. • Cache hit rates. • Look for well-known patterns (linear, cyclic, strided). • Lock contention. • Look for page access correlation. • Disk queue lengths. • Install better prefetching algorithm. • Load averages. • Disk Wait • Context switch rates. • Similar process to paging. • Detect abnormal behavior. • Replace read-ahead for the application(s). • Dynamically trigger trace generation. • CPU Hogs • Examine profile output. • Trigger adaptation heuristics. • Recompile kernel modules in application context. Self-Monitoring and Self-Adaptation Self-Monitoring and Self-Adaptation

  58. Adaptation (continued) Conclusions • Interrupt Latency • Self-monitoring is a generally useful idea. • Measure latency between interrupt arrival and delivery to process/thread. • An extensible system just makes it • Look for excessively long intervals or high variance. easier. • Check (fix) scheduling priorities. • Automatic adaptation is a cool idea. • Lock Contention • Challenging to do it correctly. • Measure lock wait times. • An extensible system makes it easier to • Decrease lock granularity on highly contested items. experiment with this. Self-Monitoring and Self-Adaptation Self-Monitoring and Self-Adaptation

  59. Self-Monitoring and Assumptions Self-Adapting Systems • Performance is important. • People do not really know how to tune applications and systems. V E R I • It would be nice to get some help from the system in tuning. T A S • Self-monitoring systems gather information about their own Margo I. Seltzer performance. Harvard University Division of Engineering and Applied Sciences Self-Monitoring and Self-Adaptation Outline Self-Monitoring in VINO • Self-Monitoring in VINO. • Measurement thread periodically collects module statistics. • Processing monitor data. • Generate detailed profiling information. • Adapting to system behavior. • Capture module inputs ( traces ) and • Conclusions. outputs ( logs ). • In-situ simulation evaluates competing algorithms and policies. Self-Monitoring and Self-Adaptation Self-Monitoring and Self-Adaptation

  60. Measurement Thread Generating Traces and Logs incoming requests measurement output data thread (graft) get_stats graft points Buffer Cache other other file lock txn systems other systems other system system system systems systems outputs VINO kernel Self-Monitoring and Self-Adaptation Self-Monitoring and Self-Adaptation Generating Traces and Logs In-Situ Simulation incoming requests pass parameters incoming requests to simulator and record real parameters pass- through Buffer Cache Buffer Cache Buffer Cache Simulator outputs simulation results outputs Self-Monitoring and Self-Adaptation Self-Monitoring and Self-Adaptation

  61. What do we do with Data? Off-line Analysis • Off-line Analysis • Use data from measurement thread to construct time series usage profile. • Monitors long-term behavior. • Conduct variance analysis. • Identifies common usage profiles. • Detects uncommon usage. • Construct predicted usage profiles. • Suggests thresholds to online system. • Determine resource thresholds from • Conducts feasibility evaluations. predicted profiles. • Online Analysis • Notify online system of thresholds. • Monitor instantaneous resource utilization. • Evaluate traces and logs; derive new • Maintain efficiency statistics. algorithms. • Detect dangerous conditions. • Simulate new algorithms, in situ. Self-Monitoring and Self-Adaptation Self-Monitoring and Self-Adaptation Online Analysis Adaptation Heuristics • Receive threshold and variance • Goal: decrease application latency. information from off-line system. • Paging • Maintain dynamic statistics about: • Collect page access trace. • Cache hit rates. • Look for well-known patterns (linear, cyclic, strided). • Lock contention. • Look for page access correlation. • Disk queue lengths. • Install better prefetching algorithm. • Load averages. • Disk Wait • Context switch rates. • Similar process to paging. • Detect abnormal behavior. • Replace read-ahead for the application(s). • Dynamically trigger trace generation. • CPU Hogs • Examine profile output. • Trigger adaptation heuristics. • Recompile kernel modules in application context. Self-Monitoring and Self-Adaptation Self-Monitoring and Self-Adaptation

  62. Adaptation (continued) Conclusions • Interrupt Latency • Self-monitoring is a generally useful idea. • Measure latency between interrupt arrival and delivery to process/thread. • An extensible system just makes it • Look for excessively long intervals or high variance. easier. • Check (fix) scheduling priorities. • Automatic adaptation is a cool idea. • Lock Contention • Challenging to do it correctly. • Measure lock wait times. • An extensible system makes it easier to • Decrease lock granularity on highly contested items. experiment with this. Self-Monitoring and Self-Adaptation Self-Monitoring and Self-Adaptation

  63. Self-Monitoring and Assumptions Self-Adapting Systems • Performance is important. • People do not really know how to tune applications and systems. V E R I • It would be nice to get some help from the system in tuning. T A S • Self-monitoring systems gather information about their own Margo I. Seltzer performance. Harvard University Division of Engineering and Applied Sciences Self-Monitoring and Self-Adaptation Outline Self-Monitoring in VINO • Self-Monitoring in VINO. • Measurement thread periodically collects module statistics. • Processing monitor data. • Generate detailed profiling information. • Adapting to system behavior. • Capture module inputs ( traces ) and • Conclusions. outputs ( logs ). • In-situ simulation evaluates competing algorithms and policies. Self-Monitoring and Self-Adaptation Self-Monitoring and Self-Adaptation

  64. Measurement Thread Generating Traces and Logs incoming requests measurement output data thread (graft) get_stats graft points Buffer Cache other other file lock txn systems other systems other system system system systems systems outputs VINO kernel Self-Monitoring and Self-Adaptation Self-Monitoring and Self-Adaptation Generating Traces and Logs In-Situ Simulation incoming requests pass parameters incoming requests to simulator and record real parameters pass- through Buffer Cache Buffer Cache Buffer Cache Simulator outputs simulation results outputs Self-Monitoring and Self-Adaptation Self-Monitoring and Self-Adaptation

  65. What do we do with Data? Off-line Analysis • Off-line Analysis • Use data from measurement thread to construct time series usage profile. • Monitors long-term behavior. • Conduct variance analysis. • Identifies common usage profiles. • Detects uncommon usage. • Construct predicted usage profiles. • Suggests thresholds to online system. • Determine resource thresholds from • Conducts feasibility evaluations. predicted profiles. • Online Analysis • Notify online system of thresholds. • Monitor instantaneous resource utilization. • Evaluate traces and logs; derive new • Maintain efficiency statistics. algorithms. • Detect dangerous conditions. • Simulate new algorithms, in situ. Self-Monitoring and Self-Adaptation Self-Monitoring and Self-Adaptation Online Analysis Adaptation Heuristics • Receive threshold and variance • Goal: decrease application latency. information from off-line system. • Paging • Maintain dynamic statistics about: • Collect page access trace. • Cache hit rates. • Look for well-known patterns (linear, cyclic, strided). • Lock contention. • Look for page access correlation. • Disk queue lengths. • Install better prefetching algorithm. • Load averages. • Disk Wait • Context switch rates. • Similar process to paging. • Detect abnormal behavior. • Replace read-ahead for the application(s). • Dynamically trigger trace generation. • CPU Hogs • Examine profile output. • Trigger adaptation heuristics. • Recompile kernel modules in application context. Self-Monitoring and Self-Adaptation Self-Monitoring and Self-Adaptation

  66. Adaptation (continued) Conclusions • Interrupt Latency • Self-monitoring is a generally useful idea. • Measure latency between interrupt arrival and delivery to process/thread. • An extensible system just makes it • Look for excessively long intervals or high variance. easier. • Check (fix) scheduling priorities. • Automatic adaptation is a cool idea. • Lock Contention • Challenging to do it correctly. • Measure lock wait times. • An extensible system makes it easier to • Decrease lock granularity on highly contested items. experiment with this. Self-Monitoring and Self-Adaptation Self-Monitoring and Self-Adaptation

  67. Self-Monitoring and Assumptions Self-Adapting Systems • Performance is important. • People do not really know how to tune applications and systems. V E R I • It would be nice to get some help from the system in tuning. T A S • Self-monitoring systems gather information about their own Margo I. Seltzer performance. Harvard University Division of Engineering and Applied Sciences Self-Monitoring and Self-Adaptation Outline Self-Monitoring in VINO • Self-Monitoring in VINO. • Measurement thread periodically collects module statistics. • Processing monitor data. • Generate detailed profiling information. • Adapting to system behavior. • Capture module inputs ( traces ) and • Conclusions. outputs ( logs ). • In-situ simulation evaluates competing algorithms and policies. Self-Monitoring and Self-Adaptation Self-Monitoring and Self-Adaptation

  68. Measurement Thread Generating Traces and Logs incoming requests measurement output data thread (graft) get_stats graft points Buffer Cache other other file lock txn systems other systems other system system system systems systems outputs VINO kernel Self-Monitoring and Self-Adaptation Self-Monitoring and Self-Adaptation Generating Traces and Logs In-Situ Simulation incoming requests pass parameters incoming requests to simulator and record real parameters pass- through Buffer Cache Buffer Cache Buffer Cache Simulator outputs simulation results outputs Self-Monitoring and Self-Adaptation Self-Monitoring and Self-Adaptation

  69. What do we do with Data? Off-line Analysis • Off-line Analysis • Use data from measurement thread to construct time series usage profile. • Monitors long-term behavior. • Conduct variance analysis. • Identifies common usage profiles. • Detects uncommon usage. • Construct predicted usage profiles. • Suggests thresholds to online system. • Determine resource thresholds from • Conducts feasibility evaluations. predicted profiles. • Online Analysis • Notify online system of thresholds. • Monitor instantaneous resource utilization. • Evaluate traces and logs; derive new • Maintain efficiency statistics. algorithms. • Detect dangerous conditions. • Simulate new algorithms, in situ. Self-Monitoring and Self-Adaptation Self-Monitoring and Self-Adaptation Online Analysis Adaptation Heuristics • Receive threshold and variance • Goal: decrease application latency. information from off-line system. • Paging • Maintain dynamic statistics about: • Collect page access trace. • Cache hit rates. • Look for well-known patterns (linear, cyclic, strided). • Lock contention. • Look for page access correlation. • Disk queue lengths. • Install better prefetching algorithm. • Load averages. • Disk Wait • Context switch rates. • Similar process to paging. • Detect abnormal behavior. • Replace read-ahead for the application(s). • Dynamically trigger trace generation. • CPU Hogs • Examine profile output. • Trigger adaptation heuristics. • Recompile kernel modules in application context. Self-Monitoring and Self-Adaptation Self-Monitoring and Self-Adaptation

  70. Adaptation (continued) Conclusions • Interrupt Latency • Self-monitoring is a generally useful idea. • Measure latency between interrupt arrival and delivery to process/thread. • An extensible system just makes it • Look for excessively long intervals or high variance. easier. • Check (fix) scheduling priorities. • Automatic adaptation is a cool idea. • Lock Contention • Challenging to do it correctly. • Measure lock wait times. • An extensible system makes it easier to • Decrease lock granularity on highly contested items. experiment with this. Self-Monitoring and Self-Adaptation Self-Monitoring and Self-Adaptation

  71. Self-Monitoring and Assumptions Self-Adapting Systems • Performance is important. • People do not really know how to tune applications and systems. V E R I • It would be nice to get some help from the system in tuning. T A S • Self-monitoring systems gather information about their own Margo I. Seltzer performance. Harvard University Division of Engineering and Applied Sciences Self-Monitoring and Self-Adaptation Outline Self-Monitoring in VINO • Self-Monitoring in VINO. • Measurement thread periodically collects module statistics. • Processing monitor data. • Generate detailed profiling information. • Adapting to system behavior. • Capture module inputs ( traces ) and • Conclusions. outputs ( logs ). • In-situ simulation evaluates competing algorithms and policies. Self-Monitoring and Self-Adaptation Self-Monitoring and Self-Adaptation

  72. Measurement Thread Generating Traces and Logs incoming requests measurement output data thread (graft) get_stats graft points Buffer Cache other other file lock txn systems other systems other system system system systems systems outputs VINO kernel Self-Monitoring and Self-Adaptation Self-Monitoring and Self-Adaptation Generating Traces and Logs In-Situ Simulation incoming requests pass parameters incoming requests to simulator and record real parameters pass- through Buffer Cache Buffer Cache Buffer Cache Simulator outputs simulation results outputs Self-Monitoring and Self-Adaptation Self-Monitoring and Self-Adaptation

  73. What do we do with Data? Off-line Analysis • Off-line Analysis • Use data from measurement thread to construct time series usage profile. • Monitors long-term behavior. • Conduct variance analysis. • Identifies common usage profiles. • Detects uncommon usage. • Construct predicted usage profiles. • Suggests thresholds to online system. • Determine resource thresholds from • Conducts feasibility evaluations. predicted profiles. • Online Analysis • Notify online system of thresholds. • Monitor instantaneous resource utilization. • Evaluate traces and logs; derive new • Maintain efficiency statistics. algorithms. • Detect dangerous conditions. • Simulate new algorithms, in situ. Self-Monitoring and Self-Adaptation Self-Monitoring and Self-Adaptation Online Analysis Adaptation Heuristics • Receive threshold and variance • Goal: decrease application latency. information from off-line system. • Paging • Maintain dynamic statistics about: • Collect page access trace. • Cache hit rates. • Look for well-known patterns (linear, cyclic, strided). • Lock contention. • Look for page access correlation. • Disk queue lengths. • Install better prefetching algorithm. • Load averages. • Disk Wait • Context switch rates. • Similar process to paging. • Detect abnormal behavior. • Replace read-ahead for the application(s). • Dynamically trigger trace generation. • CPU Hogs • Examine profile output. • Trigger adaptation heuristics. • Recompile kernel modules in application context. Self-Monitoring and Self-Adaptation Self-Monitoring and Self-Adaptation

  74. Adaptation (continued) Conclusions • Interrupt Latency • Self-monitoring is a generally useful idea. • Measure latency between interrupt arrival and delivery to process/thread. • An extensible system just makes it • Look for excessively long intervals or high variance. easier. • Check (fix) scheduling priorities. • Automatic adaptation is a cool idea. • Lock Contention • Challenging to do it correctly. • Measure lock wait times. • An extensible system makes it easier to • Decrease lock granularity on highly contested items. experiment with this. Self-Monitoring and Self-Adaptation Self-Monitoring and Self-Adaptation

  75. Self-Monitoring and Assumptions Self-Adapting Systems • Performance is important. • People do not really know how to tune applications and systems. V E R I • It would be nice to get some help from the system in tuning. T A S • Self-monitoring systems gather information about their own Margo I. Seltzer performance. Harvard University Division of Engineering and Applied Sciences Self-Monitoring and Self-Adaptation Outline Self-Monitoring in VINO • Self-Monitoring in VINO. • Measurement thread periodically collects module statistics. • Processing monitor data. • Generate detailed profiling information. • Adapting to system behavior. • Capture module inputs ( traces ) and • Conclusions. outputs ( logs ). • In-situ simulation evaluates competing algorithms and policies. Self-Monitoring and Self-Adaptation Self-Monitoring and Self-Adaptation

  76. Measurement Thread Generating Traces and Logs incoming requests measurement output data thread (graft) get_stats graft points Buffer Cache other other file lock txn systems other systems other system system system systems systems outputs VINO kernel Self-Monitoring and Self-Adaptation Self-Monitoring and Self-Adaptation Generating Traces and Logs In-Situ Simulation incoming requests pass parameters incoming requests to simulator and record real parameters pass- through Buffer Cache Buffer Cache Buffer Cache Simulator outputs simulation results outputs Self-Monitoring and Self-Adaptation Self-Monitoring and Self-Adaptation

  77. What do we do with Data? Off-line Analysis • Off-line Analysis • Use data from measurement thread to construct time series usage profile. • Monitors long-term behavior. • Conduct variance analysis. • Identifies common usage profiles. • Detects uncommon usage. • Construct predicted usage profiles. • Suggests thresholds to online system. • Determine resource thresholds from • Conducts feasibility evaluations. predicted profiles. • Online Analysis • Notify online system of thresholds. • Monitor instantaneous resource utilization. • Evaluate traces and logs; derive new • Maintain efficiency statistics. algorithms. • Detect dangerous conditions. • Simulate new algorithms, in situ. Self-Monitoring and Self-Adaptation Self-Monitoring and Self-Adaptation Online Analysis Adaptation Heuristics • Receive threshold and variance • Goal: decrease application latency. information from off-line system. • Paging • Maintain dynamic statistics about: • Collect page access trace. • Cache hit rates. • Look for well-known patterns (linear, cyclic, strided). • Lock contention. • Look for page access correlation. • Disk queue lengths. • Install better prefetching algorithm. • Load averages. • Disk Wait • Context switch rates. • Similar process to paging. • Detect abnormal behavior. • Replace read-ahead for the application(s). • Dynamically trigger trace generation. • CPU Hogs • Examine profile output. • Trigger adaptation heuristics. • Recompile kernel modules in application context. Self-Monitoring and Self-Adaptation Self-Monitoring and Self-Adaptation

  78. Adaptation (continued) Conclusions • Interrupt Latency • Self-monitoring is a generally useful idea. • Measure latency between interrupt arrival and delivery to process/thread. • An extensible system just makes it • Look for excessively long intervals or high variance. easier. • Check (fix) scheduling priorities. • Automatic adaptation is a cool idea. • Lock Contention • Challenging to do it correctly. • Measure lock wait times. • An extensible system makes it easier to • Decrease lock granularity on highly contested items. experiment with this. Self-Monitoring and Self-Adaptation Self-Monitoring and Self-Adaptation

  79. Self-Monitoring and Assumptions Self-Adapting Systems • Performance is important. • People do not really know how to tune applications and systems. V E R I • It would be nice to get some help from the system in tuning. T A S • Self-monitoring systems gather information about their own Margo I. Seltzer performance. Harvard University Division of Engineering and Applied Sciences Self-Monitoring and Self-Adaptation Outline Self-Monitoring in VINO • Self-Monitoring in VINO. • Measurement thread periodically collects module statistics. • Processing monitor data. • Generate detailed profiling information. • Adapting to system behavior. • Capture module inputs ( traces ) and • Conclusions. outputs ( logs ). • In-situ simulation evaluates competing algorithms and policies. Self-Monitoring and Self-Adaptation Self-Monitoring and Self-Adaptation

  80. Measurement Thread Generating Traces and Logs incoming requests measurement output data thread (graft) get_stats graft points Buffer Cache other other file lock txn systems other systems other system system system systems systems outputs VINO kernel Self-Monitoring and Self-Adaptation Self-Monitoring and Self-Adaptation Generating Traces and Logs In-Situ Simulation incoming requests pass parameters incoming requests to simulator and record real parameters pass- through Buffer Cache Buffer Cache Buffer Cache Simulator outputs simulation results outputs Self-Monitoring and Self-Adaptation Self-Monitoring and Self-Adaptation

  81. What do we do with Data? Off-line Analysis • Off-line Analysis • Use data from measurement thread to construct time series usage profile. • Monitors long-term behavior. • Conduct variance analysis. • Identifies common usage profiles. • Detects uncommon usage. • Construct predicted usage profiles. • Suggests thresholds to online system. • Determine resource thresholds from • Conducts feasibility evaluations. predicted profiles. • Online Analysis • Notify online system of thresholds. • Monitor instantaneous resource utilization. • Evaluate traces and logs; derive new • Maintain efficiency statistics. algorithms. • Detect dangerous conditions. • Simulate new algorithms, in situ. Self-Monitoring and Self-Adaptation Self-Monitoring and Self-Adaptation Online Analysis Adaptation Heuristics • Receive threshold and variance • Goal: decrease application latency. information from off-line system. • Paging • Maintain dynamic statistics about: • Collect page access trace. • Cache hit rates. • Look for well-known patterns (linear, cyclic, strided). • Lock contention. • Look for page access correlation. • Disk queue lengths. • Install better prefetching algorithm. • Load averages. • Disk Wait • Context switch rates. • Similar process to paging. • Detect abnormal behavior. • Replace read-ahead for the application(s). • Dynamically trigger trace generation. • CPU Hogs • Examine profile output. • Trigger adaptation heuristics. • Recompile kernel modules in application context. Self-Monitoring and Self-Adaptation Self-Monitoring and Self-Adaptation

  82. Adaptation (continued) Conclusions • Interrupt Latency • Self-monitoring is a generally useful idea. • Measure latency between interrupt arrival and delivery to process/thread. • An extensible system just makes it • Look for excessively long intervals or high variance. easier. • Check (fix) scheduling priorities. • Automatic adaptation is a cool idea. • Lock Contention • Challenging to do it correctly. • Measure lock wait times. • An extensible system makes it easier to • Decrease lock granularity on highly contested items. experiment with this. Self-Monitoring and Self-Adaptation Self-Monitoring and Self-Adaptation

  83. Self-Monitoring and Assumptions Self-Adapting Systems • Performance is important. • People do not really know how to tune applications and systems. V E R I • It would be nice to get some help from the system in tuning. T A S • Self-monitoring systems gather information about their own Margo I. Seltzer performance. Harvard University Division of Engineering and Applied Sciences Self-Monitoring and Self-Adaptation Outline Self-Monitoring in VINO • Self-Monitoring in VINO. • Measurement thread periodically collects module statistics. • Processing monitor data. • Generate detailed profiling information. • Adapting to system behavior. • Capture module inputs ( traces ) and • Conclusions. outputs ( logs ). • In-situ simulation evaluates competing algorithms and policies. Self-Monitoring and Self-Adaptation Self-Monitoring and Self-Adaptation

  84. Measurement Thread Generating Traces and Logs incoming requests measurement output data thread (graft) get_stats graft points Buffer Cache other other file lock txn systems other systems other system system system systems systems outputs VINO kernel Self-Monitoring and Self-Adaptation Self-Monitoring and Self-Adaptation Generating Traces and Logs In-Situ Simulation incoming requests pass parameters incoming requests to simulator and record real parameters pass- through Buffer Cache Buffer Cache Buffer Cache Simulator outputs simulation results outputs Self-Monitoring and Self-Adaptation Self-Monitoring and Self-Adaptation

  85. What do we do with Data? Off-line Analysis • Off-line Analysis • Use data from measurement thread to construct time series usage profile. • Monitors long-term behavior. • Conduct variance analysis. • Identifies common usage profiles. • Detects uncommon usage. • Construct predicted usage profiles. • Suggests thresholds to online system. • Determine resource thresholds from • Conducts feasibility evaluations. predicted profiles. • Online Analysis • Notify online system of thresholds. • Monitor instantaneous resource utilization. • Evaluate traces and logs; derive new • Maintain efficiency statistics. algorithms. • Detect dangerous conditions. • Simulate new algorithms, in situ. Self-Monitoring and Self-Adaptation Self-Monitoring and Self-Adaptation Online Analysis Adaptation Heuristics • Receive threshold and variance • Goal: decrease application latency. information from off-line system. • Paging • Maintain dynamic statistics about: • Collect page access trace. • Cache hit rates. • Look for well-known patterns (linear, cyclic, strided). • Lock contention. • Look for page access correlation. • Disk queue lengths. • Install better prefetching algorithm. • Load averages. • Disk Wait • Context switch rates. • Similar process to paging. • Detect abnormal behavior. • Replace read-ahead for the application(s). • Dynamically trigger trace generation. • CPU Hogs • Examine profile output. • Trigger adaptation heuristics. • Recompile kernel modules in application context. Self-Monitoring and Self-Adaptation Self-Monitoring and Self-Adaptation

  86. Adaptation (continued) Conclusions • Interrupt Latency • Self-monitoring is a generally useful idea. • Measure latency between interrupt arrival and delivery to process/thread. • An extensible system just makes it • Look for excessively long intervals or high variance. easier. • Check (fix) scheduling priorities. • Automatic adaptation is a cool idea. • Lock Contention • Challenging to do it correctly. • Measure lock wait times. • An extensible system makes it easier to • Decrease lock granularity on highly contested items. experiment with this. Self-Monitoring and Self-Adaptation Self-Monitoring and Self-Adaptation

  87. Self-Monitoring and Assumptions Self-Adapting Systems • Performance is important. • People do not really know how to tune applications and systems. V E R I • It would be nice to get some help from the system in tuning. T A S • Self-monitoring systems gather information about their own Margo I. Seltzer performance. Harvard University Division of Engineering and Applied Sciences Self-Monitoring and Self-Adaptation Outline Self-Monitoring in VINO • Self-Monitoring in VINO. • Measurement thread periodically collects module statistics. • Processing monitor data. • Generate detailed profiling information. • Adapting to system behavior. • Capture module inputs ( traces ) and • Conclusions. outputs ( logs ). • In-situ simulation evaluates competing algorithms and policies. Self-Monitoring and Self-Adaptation Self-Monitoring and Self-Adaptation

  88. Measurement Thread Generating Traces and Logs incoming requests measurement output data thread (graft) get_stats graft points Buffer Cache other other file lock txn systems other systems other system system system systems systems outputs VINO kernel Self-Monitoring and Self-Adaptation Self-Monitoring and Self-Adaptation Generating Traces and Logs In-Situ Simulation incoming requests pass parameters incoming requests to simulator and record real parameters pass- through Buffer Cache Buffer Cache Buffer Cache Simulator outputs simulation results outputs Self-Monitoring and Self-Adaptation Self-Monitoring and Self-Adaptation

  89. What do we do with Data? Off-line Analysis • Off-line Analysis • Use data from measurement thread to construct time series usage profile. • Monitors long-term behavior. • Conduct variance analysis. • Identifies common usage profiles. • Detects uncommon usage. • Construct predicted usage profiles. • Suggests thresholds to online system. • Determine resource thresholds from • Conducts feasibility evaluations. predicted profiles. • Online Analysis • Notify online system of thresholds. • Monitor instantaneous resource utilization. • Evaluate traces and logs; derive new • Maintain efficiency statistics. algorithms. • Detect dangerous conditions. • Simulate new algorithms, in situ. Self-Monitoring and Self-Adaptation Self-Monitoring and Self-Adaptation Online Analysis Adaptation Heuristics • Receive threshold and variance • Goal: decrease application latency. information from off-line system. • Paging • Maintain dynamic statistics about: • Collect page access trace. • Cache hit rates. • Look for well-known patterns (linear, cyclic, strided). • Lock contention. • Look for page access correlation. • Disk queue lengths. • Install better prefetching algorithm. • Load averages. • Disk Wait • Context switch rates. • Similar process to paging. • Detect abnormal behavior. • Replace read-ahead for the application(s). • Dynamically trigger trace generation. • CPU Hogs • Examine profile output. • Trigger adaptation heuristics. • Recompile kernel modules in application context. Self-Monitoring and Self-Adaptation Self-Monitoring and Self-Adaptation

  90. Adaptation (continued) Conclusions • Interrupt Latency • Self-monitoring is a generally useful idea. • Measure latency between interrupt arrival and delivery to process/thread. • An extensible system just makes it • Look for excessively long intervals or high variance. easier. • Check (fix) scheduling priorities. • Automatic adaptation is a cool idea. • Lock Contention • Challenging to do it correctly. • Measure lock wait times. • An extensible system makes it easier to • Decrease lock granularity on highly contested items. experiment with this. Self-Monitoring and Self-Adaptation Self-Monitoring and Self-Adaptation

Recommend


More recommend