using multi system monitoring time series to predict
play

Using Multi-System Monitoring Time Series to Predict Performance - PowerPoint PPT Presentation

Using Multi-System Monitoring Time Series to Predict Performance Events Andreas Schrgenhumer Mario Kahlhofer Peter Chalupar Hanspeter Mssenbck Paul Grnbacher 09.11.2018 Motivation t 2 Motivation t 2 Motivation t Train ML 2


  1. Using Multi-System Monitoring Time Series to Predict Performance Events Andreas Schörgenhumer Mario Kahlhofer Peter Chalupar Hanspeter Mössenböck Paul Grünbacher 09.11.2018

  2. Motivation t 2

  3. Motivation t 2

  4. Motivation t Train ML 2

  5. Motivation t Train ML t 2

  6. Motivation t Train ML Predict t 2

  7. Motivation t Train ML Predict t 2

  8. Motivation t Train Straightforward: ML Single system • Single component • Predict Univariate time series • t 2

  9. Motivation Multiple, interlinked components 3

  10. Motivation Multiple, interlinked components Multivariate time series 3

  11. Motivation Multiple, Event to data interlinked connection components Multivariate time series 3

  12. Motivation Multiple systems Multiple, Event to data interlinked connection components … Multivariate time series 3

  13. Motivation Multiple systems Multiple, Event to data interlinked connection components … Multivariate time series ML Train 3

  14. Approach Configs Multi- Preprocessing ML System CSVs Framework Data 4

  15. Approach Configs Multi- Preprocessing ML System CSVs Framework Data (1) Data 4

  16. Approach Configs Multi- Preprocessing ML System CSVs Framework Data (1) Data (2) Preprocessing 4

  17. Approach Configs Multi- Preprocessing ML System CSVs Framework Data (1) Data (2) Preprocessing (3) Prediction 4

  18. (1) Data System 1 * Service * * Host 1 1 * * Network Disk Interface 5

  19. (1) Data System 1 250 systems 20-day export * Service * * Host 1 1 * * Network Disk Interface 5

  20. (1) Data Service slowdowns System 1 250 systems 20-day export * Events Service * * Host 1 1 * * Network Disk Interface 5

  21. (1) Data Service slowdowns System 1 250 systems 20-day export CPU load * Events Memory available Service SWAP available … * 11 Time * Series Host 1 1 * * Network Disk Interface 5

  22. (1) Data Service slowdowns System 1 250 systems 20-day export CPU load * Events Memory available Service SWAP available … * 11 Time * Available Series Host Read time Write time 1 1 … * * Network Disk Interface 13 Time Series 5

  23. (1) Data Service slowdowns System 1 250 systems 20-day export CPU load * Events Memory available Service SWAP available … * 11 Time * Available Bytes received Series Host Read time Bytes sent Write time Packets dropped 1 1 … … * * Network Disk Interface 13 Time 10 Time Series Series ... 1-minute resolution 5

  24. (2) Preprocessing – Framework Preprocessing Framework 6

  25. (2) Preprocessing – Framework • Input: YAMLs (configurations/configs) • Contains all necessary data processing settings • Easily changeable due to YAML format systems: - “sys1” systems: - “sys2” - “sys1” systems: timeSeries: - “sys2” - “sys1” - CPU_LOAD timeSeries: - “sys2” from: “2018 -01- 19 09:00” - CPU_LOAD timeSeries: to: “2018 -02- 02 09:00” from: “2018 -01- 19 09:00” - CPU_LOAD ... to: “2018 -02- 02 09:00” from: “2018 -01- 19 09:00” leadTime: 0 ... to: “2018 -02- 02 09:00” observationWindowsBoxes: leadTime: 0 ... CPU_LOAD: observationWindowsBoxes: leadTime: 0 Preprocessing - size: 60 CPU_LOAD: observationWindowsBoxes: step: 1 - size: 60 CPU_LOAD: Framework aggregationFunctions: step: 1 - size: 60 - “AVG” aggregationFunctions: step: 1 combinationFunctions: - “AVG” aggregationFunctions: - “AVG” combinationFunctions: - “AVG” samplingMode: “PER_EVENT” - “AVG” combinationFunctions: missingDataPointMode: “NAN” samplingMode: “PER_EVENT” - “AVG” addAttributes: true missingDataPointMode: “NAN” samplingMode: “PER_EVENT” ... addAttributes: true missingDataPointMode: “NAN” ... addAttributes: true ... 6

  26. (2) Preprocessing – Framework • Input: YAMLs (configurations/configs) • Contains all necessary data processing settings • Easily changeable due to YAML format • Output: CSVs (feature vectors) • Portable format, directly useable for ML systems: - “sys1” systems: - “sys2” - “sys1” systems: timeSeries: - “sys2” - “sys1” - CPU_LOAD timeSeries: - “sys2” from: “2018 -01- 19 09:00” - CPU_LOAD timeSeries: to: “2018 -02- 02 09:00” from: “2018 -01- 19 09:00” - CPU_LOAD ... to: “2018 -02- 02 09:00” from: “2018 -01- 19 09:00” leadTime: 0 ... to: “2018 -02- 02 09:00” CPU_LOAD:AVG System Label observationWindowsBoxes: leadTime: 0 ... CPU_LOAD:AVG System Label 0.95 sys1 Event CPU_LOAD: observationWindowsBoxes: leadTime: 0 CPU_LOAD:AVG System Label Preprocessing 0.95 sys1 Event - size: 60 CPU_LOAD: observationWindowsBoxes: 0.71 sys2 No event 0.95 sys1 Event step: 1 - size: 60 CPU_LOAD: 0.71 sys2 No event Framework 0.90 sys2 Event aggregationFunctions: step: 1 - size: 60 0.71 sys2 No event 0.90 sys2 Event - “AVG” aggregationFunctions: step: 1 0.87 sys2 No event 0.90 sys2 Event combinationFunctions: - “AVG” aggregationFunctions: 0.87 sys2 No event - “AVG” combinationFunctions: - “AVG” 0.84 sys1 No event 0.87 sys2 No event samplingMode: “PER_EVENT” - “AVG” combinationFunctions: 0.84 sys1 No event missingDataPointMode: “NAN” samplingMode: “PER_EVENT” - “AVG” 0.84 sys1 No event addAttributes: true missingDataPointMode: “NAN” samplingMode: “PER_EVENT” ... addAttributes: true missingDataPointMode: “NAN” ... addAttributes: true ... 6

  27. (2) Preprocessing – Config Settings Setting Example Systems [sys1, sys2, ...] Time series [Host: CPU_LOAD, Disk: AVAILABLE, ...] From: 19-01-2018 09:00 Time frame To: 02-02-2018 09:00 Sampling mode PER_EVENT, SLIDE_THROUGH Negative sampling source NON_EVENT_SERVICES, EVENT_SERVICES, ... Lead time 10 min Observation windows 60 min, AVG aggregation, AVG combination Missing data mode DROP, NAN, LAST_VALUE, ... Metadata System, special attributes, ... ... ... 7

  28. (2) Preprocessing – Example ... Service samplingMode: “PER_EVENT” Disk 1 leadTime: 5 observationWindowsBoxes: Disk 2 Host CPU_LOAD: - size: 15 Disk 3 Network aggregationFunctions: - “MIN” - “MAX” ... DISK_WRITE: - size: 30 aggregationFunctions: - “AVG” - “MIN” - “MAX” - “STD_DEV” combinationFunctions: - “AVG” - “MIN” - “MAX” - “AVG” BYTES_SENT: - size: 5 aggregationFunctions: - “NONE” ... - size: 30 aggregationFunctions: - “AVG” ... ... 8

  29. (2) Preprocessing – Example ... Service samplingMode: “PER_EVENT” Disk 1 leadTime: 5 observationWindowsBoxes: Disk 2 Host CPU_LOAD: - size: 15 Disk 3 Network aggregationFunctions: - “MIN” - “MAX” ... DISK_WRITE: - size: 30 aggregationFunctions: - “AVG” - “MIN” - “MAX” - “STD_DEV” combinationFunctions: - “AVG” - “MIN” - “MAX” - “AVG” BYTES_SENT: - size: 5 aggregationFunctions: - “NONE” ... - size: 30 aggregationFunctions: - “AVG” ... ... 8

  30. (2) Preprocessing – Example ... Service samplingMode: “PER_EVENT” Disk 1 leadTime: 5 observationWindowsBoxes: Disk 2 Host CPU_LOAD: - size: 15 Disk 3 Network aggregationFunctions: - “MIN” - “MAX” ... DISK_WRITE: - size: 30 aggregationFunctions: - “AVG” - “MIN” - “MAX” - “STD_DEV” combinationFunctions: - “AVG” - “MIN” - “MAX” - “AVG” BYTES_SENT: - size: 5 aggregationFunctions: - “NONE” ... - size: 30 aggregationFunctions: - “AVG” ... ... 8

  31. (2) Preprocessing – Example ... Service samplingMode: “PER_EVENT” Disk 1 leadTime: 5 observationWindowsBoxes: Disk 2 Host CPU_LOAD: - size: 15 Disk 3 Network aggregationFunctions: - “MIN” - “MAX” ... DISK_WRITE: - size: 30 aggregationFunctions: - “AVG” - “MIN” - “MAX” - “STD_DEV” combinationFunctions: - “AVG” - “MIN” - “MAX” - “AVG” BYTES_SENT: - size: 5 aggregationFunctions: - “NONE” ... - size: 30 aggregationFunctions: - “AVG” ... ... 8

  32. (2) Preprocessing – Example ... Service samplingMode: “PER_EVENT” Disk 1 leadTime: 5 observationWindowsBoxes: Disk 2 Host CPU_LOAD: - size: 15 Disk 3 Network aggregationFunctions: - “MIN” - “MAX” ... DISK_WRITE: - size: 30 aggregationFunctions: - “AVG” - “MIN” - “MAX” - “STD_DEV” combinationFunctions: - “AVG” - “MIN” - “MAX” - “AVG” BYTES_SENT: - size: 5 aggregationFunctions: - “NONE” ... - size: 30 aggregationFunctions: - “AVG” ... ... 8

  33. (2) Preprocessing – Example ... Service samplingMode: “PER_EVENT” Disk 1 leadTime: 5 observationWindowsBoxes: Disk 2 Host CPU_LOAD: - size: 15 Disk 3 Network aggregationFunctions: - “MIN” - “MAX” ... DISK_WRITE: - size: 30 aggregationFunctions: - “AVG” - “MIN” - “MAX” - “STD_DEV” combinationFunctions: - “AVG” - “MIN” - “MAX” - “AVG” BYTES_SENT: - size: 5 aggregationFunctions: - “NONE” ... - size: 30 aggregationFunctions: - “AVG” ... ... 8

  34. (3) Prediction 20 days, 250 systems, 34 time series t Preprocessing Framework 9

  35. (3) Prediction 20 days, 250 systems, 34 time series t 14d 6d Train Test Preprocessing Framework 9

Recommend


More recommend