Using Multi-System Monitoring Time Series to Predict Performance Events Andreas Schörgenhumer Mario Kahlhofer Peter Chalupar Hanspeter Mössenböck Paul Grünbacher 09.11.2018
Motivation t 2
Motivation t 2
Motivation t Train ML 2
Motivation t Train ML t 2
Motivation t Train ML Predict t 2
Motivation t Train ML Predict t 2
Motivation t Train Straightforward: ML Single system • Single component • Predict Univariate time series • t 2
Motivation Multiple, interlinked components 3
Motivation Multiple, interlinked components Multivariate time series 3
Motivation Multiple, Event to data interlinked connection components Multivariate time series 3
Motivation Multiple systems Multiple, Event to data interlinked connection components … Multivariate time series 3
Motivation Multiple systems Multiple, Event to data interlinked connection components … Multivariate time series ML Train 3
Approach Configs Multi- Preprocessing ML System CSVs Framework Data 4
Approach Configs Multi- Preprocessing ML System CSVs Framework Data (1) Data 4
Approach Configs Multi- Preprocessing ML System CSVs Framework Data (1) Data (2) Preprocessing 4
Approach Configs Multi- Preprocessing ML System CSVs Framework Data (1) Data (2) Preprocessing (3) Prediction 4
(1) Data System 1 * Service * * Host 1 1 * * Network Disk Interface 5
(1) Data System 1 250 systems 20-day export * Service * * Host 1 1 * * Network Disk Interface 5
(1) Data Service slowdowns System 1 250 systems 20-day export * Events Service * * Host 1 1 * * Network Disk Interface 5
(1) Data Service slowdowns System 1 250 systems 20-day export CPU load * Events Memory available Service SWAP available … * 11 Time * Series Host 1 1 * * Network Disk Interface 5
(1) Data Service slowdowns System 1 250 systems 20-day export CPU load * Events Memory available Service SWAP available … * 11 Time * Available Series Host Read time Write time 1 1 … * * Network Disk Interface 13 Time Series 5
(1) Data Service slowdowns System 1 250 systems 20-day export CPU load * Events Memory available Service SWAP available … * 11 Time * Available Bytes received Series Host Read time Bytes sent Write time Packets dropped 1 1 … … * * Network Disk Interface 13 Time 10 Time Series Series ... 1-minute resolution 5
(2) Preprocessing – Framework Preprocessing Framework 6
(2) Preprocessing – Framework • Input: YAMLs (configurations/configs) • Contains all necessary data processing settings • Easily changeable due to YAML format systems: - “sys1” systems: - “sys2” - “sys1” systems: timeSeries: - “sys2” - “sys1” - CPU_LOAD timeSeries: - “sys2” from: “2018 -01- 19 09:00” - CPU_LOAD timeSeries: to: “2018 -02- 02 09:00” from: “2018 -01- 19 09:00” - CPU_LOAD ... to: “2018 -02- 02 09:00” from: “2018 -01- 19 09:00” leadTime: 0 ... to: “2018 -02- 02 09:00” observationWindowsBoxes: leadTime: 0 ... CPU_LOAD: observationWindowsBoxes: leadTime: 0 Preprocessing - size: 60 CPU_LOAD: observationWindowsBoxes: step: 1 - size: 60 CPU_LOAD: Framework aggregationFunctions: step: 1 - size: 60 - “AVG” aggregationFunctions: step: 1 combinationFunctions: - “AVG” aggregationFunctions: - “AVG” combinationFunctions: - “AVG” samplingMode: “PER_EVENT” - “AVG” combinationFunctions: missingDataPointMode: “NAN” samplingMode: “PER_EVENT” - “AVG” addAttributes: true missingDataPointMode: “NAN” samplingMode: “PER_EVENT” ... addAttributes: true missingDataPointMode: “NAN” ... addAttributes: true ... 6
(2) Preprocessing – Framework • Input: YAMLs (configurations/configs) • Contains all necessary data processing settings • Easily changeable due to YAML format • Output: CSVs (feature vectors) • Portable format, directly useable for ML systems: - “sys1” systems: - “sys2” - “sys1” systems: timeSeries: - “sys2” - “sys1” - CPU_LOAD timeSeries: - “sys2” from: “2018 -01- 19 09:00” - CPU_LOAD timeSeries: to: “2018 -02- 02 09:00” from: “2018 -01- 19 09:00” - CPU_LOAD ... to: “2018 -02- 02 09:00” from: “2018 -01- 19 09:00” leadTime: 0 ... to: “2018 -02- 02 09:00” CPU_LOAD:AVG System Label observationWindowsBoxes: leadTime: 0 ... CPU_LOAD:AVG System Label 0.95 sys1 Event CPU_LOAD: observationWindowsBoxes: leadTime: 0 CPU_LOAD:AVG System Label Preprocessing 0.95 sys1 Event - size: 60 CPU_LOAD: observationWindowsBoxes: 0.71 sys2 No event 0.95 sys1 Event step: 1 - size: 60 CPU_LOAD: 0.71 sys2 No event Framework 0.90 sys2 Event aggregationFunctions: step: 1 - size: 60 0.71 sys2 No event 0.90 sys2 Event - “AVG” aggregationFunctions: step: 1 0.87 sys2 No event 0.90 sys2 Event combinationFunctions: - “AVG” aggregationFunctions: 0.87 sys2 No event - “AVG” combinationFunctions: - “AVG” 0.84 sys1 No event 0.87 sys2 No event samplingMode: “PER_EVENT” - “AVG” combinationFunctions: 0.84 sys1 No event missingDataPointMode: “NAN” samplingMode: “PER_EVENT” - “AVG” 0.84 sys1 No event addAttributes: true missingDataPointMode: “NAN” samplingMode: “PER_EVENT” ... addAttributes: true missingDataPointMode: “NAN” ... addAttributes: true ... 6
(2) Preprocessing – Config Settings Setting Example Systems [sys1, sys2, ...] Time series [Host: CPU_LOAD, Disk: AVAILABLE, ...] From: 19-01-2018 09:00 Time frame To: 02-02-2018 09:00 Sampling mode PER_EVENT, SLIDE_THROUGH Negative sampling source NON_EVENT_SERVICES, EVENT_SERVICES, ... Lead time 10 min Observation windows 60 min, AVG aggregation, AVG combination Missing data mode DROP, NAN, LAST_VALUE, ... Metadata System, special attributes, ... ... ... 7
(2) Preprocessing – Example ... Service samplingMode: “PER_EVENT” Disk 1 leadTime: 5 observationWindowsBoxes: Disk 2 Host CPU_LOAD: - size: 15 Disk 3 Network aggregationFunctions: - “MIN” - “MAX” ... DISK_WRITE: - size: 30 aggregationFunctions: - “AVG” - “MIN” - “MAX” - “STD_DEV” combinationFunctions: - “AVG” - “MIN” - “MAX” - “AVG” BYTES_SENT: - size: 5 aggregationFunctions: - “NONE” ... - size: 30 aggregationFunctions: - “AVG” ... ... 8
(2) Preprocessing – Example ... Service samplingMode: “PER_EVENT” Disk 1 leadTime: 5 observationWindowsBoxes: Disk 2 Host CPU_LOAD: - size: 15 Disk 3 Network aggregationFunctions: - “MIN” - “MAX” ... DISK_WRITE: - size: 30 aggregationFunctions: - “AVG” - “MIN” - “MAX” - “STD_DEV” combinationFunctions: - “AVG” - “MIN” - “MAX” - “AVG” BYTES_SENT: - size: 5 aggregationFunctions: - “NONE” ... - size: 30 aggregationFunctions: - “AVG” ... ... 8
(2) Preprocessing – Example ... Service samplingMode: “PER_EVENT” Disk 1 leadTime: 5 observationWindowsBoxes: Disk 2 Host CPU_LOAD: - size: 15 Disk 3 Network aggregationFunctions: - “MIN” - “MAX” ... DISK_WRITE: - size: 30 aggregationFunctions: - “AVG” - “MIN” - “MAX” - “STD_DEV” combinationFunctions: - “AVG” - “MIN” - “MAX” - “AVG” BYTES_SENT: - size: 5 aggregationFunctions: - “NONE” ... - size: 30 aggregationFunctions: - “AVG” ... ... 8
(2) Preprocessing – Example ... Service samplingMode: “PER_EVENT” Disk 1 leadTime: 5 observationWindowsBoxes: Disk 2 Host CPU_LOAD: - size: 15 Disk 3 Network aggregationFunctions: - “MIN” - “MAX” ... DISK_WRITE: - size: 30 aggregationFunctions: - “AVG” - “MIN” - “MAX” - “STD_DEV” combinationFunctions: - “AVG” - “MIN” - “MAX” - “AVG” BYTES_SENT: - size: 5 aggregationFunctions: - “NONE” ... - size: 30 aggregationFunctions: - “AVG” ... ... 8
(2) Preprocessing – Example ... Service samplingMode: “PER_EVENT” Disk 1 leadTime: 5 observationWindowsBoxes: Disk 2 Host CPU_LOAD: - size: 15 Disk 3 Network aggregationFunctions: - “MIN” - “MAX” ... DISK_WRITE: - size: 30 aggregationFunctions: - “AVG” - “MIN” - “MAX” - “STD_DEV” combinationFunctions: - “AVG” - “MIN” - “MAX” - “AVG” BYTES_SENT: - size: 5 aggregationFunctions: - “NONE” ... - size: 30 aggregationFunctions: - “AVG” ... ... 8
(2) Preprocessing – Example ... Service samplingMode: “PER_EVENT” Disk 1 leadTime: 5 observationWindowsBoxes: Disk 2 Host CPU_LOAD: - size: 15 Disk 3 Network aggregationFunctions: - “MIN” - “MAX” ... DISK_WRITE: - size: 30 aggregationFunctions: - “AVG” - “MIN” - “MAX” - “STD_DEV” combinationFunctions: - “AVG” - “MIN” - “MAX” - “AVG” BYTES_SENT: - size: 5 aggregationFunctions: - “NONE” ... - size: 30 aggregationFunctions: - “AVG” ... ... 8
(3) Prediction 20 days, 250 systems, 34 time series t Preprocessing Framework 9
(3) Prediction 20 days, 250 systems, 34 time series t 14d 6d Train Test Preprocessing Framework 9
Recommend
More recommend