using machine learning for intelligent storage
play

Using Machine Learning for Intelligent Storage Performance Anomaly - PowerPoint PPT Presentation

Using Machine Learning for Intelligent Storage Performance Anomaly Detection Ramakrishna Vadla, IBM Archana Chinnaiah, IBM Acknowledgement : Sumant Padbidri, Anbazhagan Mani Agenda Market Estimates & Forecasts Applications in


  1. Using Machine Learning for Intelligent Storage Performance Anomaly Detection Ramakrishna Vadla, IBM Archana Chinnaiah, IBM Acknowledgement : Sumant Padbidri, Anbazhagan Mani

  2. Agenda • Market Estimates & Forecasts • Applications in Storage • Cloud Architecture • Anomaly Detection • Performance Anomaly Detection

  3. AI & ML - Market Estimates & Forecasts ü Worldwide revenues for cognitive and AI systems will increase from $12.5B in 2017 to more than $46B in 2020 ü IDC forecasts spending on AI and ML will grow from $12B in 2017 to $57.6B by 2021. ü Machine learning patents grew at a 34% between 2013 and 2017, 3rd-fastest growing category of all patents granted. Source: IFI Claims Patent Services (Patent Analytics). 8 Fastest Growing Technologies SlideShare Presentation. Source:http://www.forbes.com

  4. AI & ML - Market Estimates & Forecasts Why Now? ü Enormously increased data - 90% data created in last couple of years ü Substantially more-powerful computer hardware – CPU, GPU ü Cloud makes big data more widely accessible ü Significantly improved algorithms Source: Deloitte Global Predictions 2018 Infographics

  5. Machine Learning Applications in Storage Applications Value Proposition Ø Predictive Analytics ü Prevent Issues proactively before they occur. Ø Capacity Forecasting – (Regression) Ø Power consumption in data centers – (Regression) ü Avoid downtime & Achieve Ø Tracking of known issues - Learn from other customer issues - uptime 99.999% (Classification) Ø Predicting blocks to be accessed in near future (Recommendations) ü Cost efficiency - Reduce storage & operational costs Ø Performance anomaly detection Ø Performance metrics analysis (Time-series data analysis) ü Data Storage Optimization Ø Automated Triaging and Root Cause Analysis (Classification) Ø Log analysis - (Clustering) ü Simplifying the support Ø Configuration best practices recommendations ü Proactive notification of risks Ø Manual upgrades/Automated upgrades and health checks Ø Configuration validation to avoid interruptions in service Ø Intelligent Performance Tuning

  6. Cloud Architecture - Storage Analytics The world’s most valuable resource is no longer oil, but data www.economist.com ü Cloud based scale-out architecture. Elastic IBM Hadoop Spark ü Storage systems support data collection with Search Watson high frequencies, seconds, minutes. ü More data available for analysis. ü Data lake based on NoSQL such as Cassandra Data Lake deployed on the cloud. Client ü All clients send storage metric data to cloud – performance, config and health data. ü Multi-tenancy support. Client Client Client ü Support for integration of ML tools.

  7. Machine Learning – Anomaly Detection Predict based on training data containing desired outputs. Training data contains normal and anomaly labelling • Supervised Learning • Regression, Classification, Decision trees, Random forests, K-Nearest Neighbor, SVM Doesn’t include desired outputs, goal to discover patterns No labels provided – assumption anomalies are very rare compare to normal • Unsupervised Learning • Clustering - K-Means, Hierarchical, DBSCAN, Time-series analysis, ARIMA Training data includes a few desired outputs Semi-supervised Learning Training data contains only normal labelling Rewards from sequence of actions Reinforcement Learning Agent -> Action - > Environment -> Reward & State -> Agent (Markov Decision Process)

  8. Storage Performance Challenges Bottlenecks Metrics Correlations • Disk failure/Inaccessible disks • I/O Rate R/W, • CPU & Network Traffic • Read/Write I/O errors • Data Rate R/W, • CPU & Memory • Volume issues • Response time R/W, • Port & Host counters • Port masking • Cache hit R/W, • IOPs, read rate, & CPU, • Configuration issues – Host, memory • Data block size R/W, Storage subsystem, port, Interoperability • Porta data rate R/W, • Network congestion • Port-local node queue time • Workload configurations • UPS battery failure • Port protocol errors, • Port congestion

  9. Performance Anomaly Detection Clustering – Outlier detection K-Means DBSCAN

  10. Performance Anomaly Detection Time Series Anomaly Detection • ARIMA - AutoRegressive Integrated Moving Average IOPs Rate Anomaly

  11. Log Analysis – Anomaly Detection 2018-05-05 09:11:20.672 [<Device>] [<Thread>] [INFO] Processing complete. Log Collection 2018-05-05 09:11:20.672 [<Device>] [<Thread>] [INFO] Processing complete . Log Parsing Feature Extraction [timestamp, device, process state]. Anomaly Detection Time-series Analysis

  12. Q & A Thank You

Recommend


More recommend