Griffon: Reasoning about Job Anomalies with Unlabeled Data in - PowerPoint PPT Presentation

Griffon: Reasoning about Job Anomalies with Unlabeled Data in Cloud-based Platforms Liqun Shao, Yiwen Zhu , Siqi Liu*, Abhiram Eswaran, Kristin Lieber, Janhavi Mahajan, Minsoo Thigpen, Sudhir Darbha, Subru Krishnan, Soundar Srinivasan, Carlo Curino, Konstantinos Karanasos Microsoft, *University of Pittsburgh

Microsoft’s Internal Big Data Analytics Platform 500K 250K (jobs/day) (nodes)

https://www.google.com/url?sa=i&rct=j&q=&esrc=s&source=images&cd=&cad=rja&uact=8&ved=2ahUKEwjv9uXU0__lAhWtIDQIHaU0ABwQjB16BAgBEAM&url=https%3A%2F%2Fwww.intellectualtakeout.org%2Farticle%2Fka nye-wests-private-firefighting-force-good&psig=AOvVaw2pinteqP1A7uhZRdXBfq0J&ust=1574575139344414

My job is SLOW ER …

On On-Call Support Engineer Work rkflow 57 mins 88 mins

End-to-End Identify job deployed and used slowdown causes Drops the Consistent results validated investigation time by domain experts

Gri riffon: Before and Aft fter Before Griffon A job goes out of An Engineer spends hours of manual After 2-3 days of investigation, the service-level objectives labor looking through hundreds of reason for job slowdown is found. (SLO) and the engineer metrics is alerted After Griffon The reason is found in the top five generated by Griffon. A job goes out of SLO The Job ID and VC is fed All the metrics Griffon has and the engineer is through Griffon and the top looked at can be ruled out alerted reasons for job slowdown are and the engineer can generated automatically direct their efforts to a smaller set of metrics.

Grif iffon • ML Methodology • System Architecture

Data wrangling Data collection: Identifying the right data Unlabeled data Model building: Small amount of validation data Tradeoff between accuracy and interpretability Cannot maintain models for each job template Deployment and Scalability Evaluation: Evaluation metrics for root causes of slow jobs Challenges

Identify Job Slowdown Reasons Job Runtime Predictor Feature Contributions

Job Runtime Prediction Job Runtime Predictor MARE LR RF GBT DNN Per-Template Model 0.186 0.116 0.124 0.146 Global Model 0.235 0.121 0.277 0.353

Feature Contributions Reformulate decision tree models to linear models: Compare feature contributions to baseline predictions:

Feature Contributions Intercept/Bias 10 m +6 m InputSize -4 m JobPriority BonusPnHours -0 m 12 m Prediction

Intercept 10 m +6 m InputSize -4 m Intercept 10 m JobPriority +3 m InputSize BonusPnHours -0 m JobPriority -2 m 12 m Prediction -4 m Slow Job BonusPnHours InputSize: 6-3 = 3 JobPriority: 4 -2 = 2 7 m Prediction BonusPnHours: 4 – 0 = 4 Baseline Job

Architecture

Azure Big Data Analytics Platform

Azure Big Data Analytics Platform Azure ML with MLFlow: • Archiving • Versioning • Serving

Flask Application

Griffon Output

Job Id Predicted Reason Engineer Validated Rank Confidence Reason Level 9182 Input size Input size 1 High Validation of Griffon Predictions

Job Id Predicted Reason Engineer Validated Rank Confidence Reason Level 9182 Input size Input size 1 High Validation of 8578 Revocation Revocation 4 Medium Griffon Predictions

Job Id Predicted Reason Engineer Validated Rank Confidence Reason Level 9182 Input size Input size 1 High Validation of 8578 Revocation Revocation 4 Medium 4414 Yarn or cluster Yarn or cluster - Low Griffon issue issue 6170 PN hours PN hours 5 Medium Predictions 7588 Time skew Time skew 1 High 3798 PN hours PN hours 1 High 1590 PN hours PN hours 1 High 2560 Usable machine Usable machine 2 High count count

Scalability & Generalization

Conclusions • End-to-end interpretable ranking system to identify the root causes of job slowdowns • No human labeled reasons needed • Highly consistent results validated by on-call engineers • Our model generalizes well by testing on job templates not included in the training set

Thank you! Please see our poster for more details ☺ !

Griffon: Reasoning about Job Anomalies with Unlabeled Data in - PowerPoint PPT Presentation

Griffon: Reasoning about Job Anomalies with Unlabeled Data in Cloud-based Platforms Liqun Shao, Yiwen Zhu , Siqi Liu*, Abhiram Eswaran, Kristin Lieber, Janhavi Mahajan, Minsoo Thigpen, Sudhir Darbha, Subru Krishnan, Soundar Srinivasan, Carlo

Clustering Clustering is an unsupervised classification method, i.e. unlabeled data is partitioned

10701 Semi supervised learning Can Unlabeled Data improve supervised learning? Important

Word2Vec Michael Collins, Columbia University Motivation We can easily collect very large

Veterinary Surgery Dominique J Griffon DVM, DECVS, DACVS, MS, PhD Editor-in-Chief Content

Learning from Limited Labeled Data (but a lot of unlabeled data) NELL as a case study Tom M.

Motivation Both human- and computer-generated programs sometimes contain data-flow anomalies .

Principal Component Analysis 4/7/17 PCA: the setting Unsupervised learning Unlabeled data

Classification from Pairwise Similarity and Unlabeled Data Han Bao 1,2 , Gang Niu 2 , Masashi

Co-Training Based on Combining Labeled and Unlabeled Data with Co-Training by A. Blum

Principal Component Ananalysis 4-8-2016 PCA: the setting Unsupervised learning Unlabeled

Anomalies in Data Maximilian Toller KDDM2 Maximilian Toller, Know-Center > www.tugraz.at 1

K-Means Clustering 3/3/17 Unsupervised Learning We have a collection of unlabeled data

UnTran: Recognizing Unseen Activities with Unlabeled data using Transfer Learning ACM/IEEE

BCS Cumbria 16 th May 2019 1 In Introductions Steve Lawless CEO Purple Griffon Dr

Mining Adaptively Frequent Closed Unlabeled Rooted Trees in Data Streams Albert Bifet and Ricard

A Framework for Learnig Predictive Structures from Multiple Tasks and Unlabeled Data Rie Kubota

Classification from Positive, Unlabeled and Biased Negative Data Poster #180 Yu-Guan Hsieh 1 ,

Pretraining Sentiment Classifiers with Unlabeled Dialog Data Jul. 18, 2018 Toru Shimizu *1 ,

Mining Anomalies Andrzej Wasylkowski 1 Why Mine Anomalies? How can we make programs more

Overcoming Catastrophic Forgetting with Unlabeled Data in the Wild Presenters: Nikhil Kannan, Ying

Anticipating Visual Representations from Unlabeled Data Carl Vondrick, Hamed Pirsiavash, Antonio

Learning from Unlabeled Data INFO-4604, Applied Machine Learning University of Colorado Boulder

Combining Labeled and Unlabeled Data in Statistical Natural Language Parsing Simon Fraser

Evaluation of Multi-Terminology Super-Concepts for Information Retrieval Griffon N a , Soualmia

Griffon: Reasoning about Job Anomalies with Unlabeled Data in - PowerPoint PPT Presentation

Griffon: Reasoning about Job Anomalies with Unlabeled Data in Cloud-based Platforms Liqun Shao, Yiwen Zhu , Siqi Liu*, Abhiram Eswaran, Kristin Lieber, Janhavi Mahajan, Minsoo Thigpen, Sudhir Darbha, Subru Krishnan, Soundar Srinivasan, Carlo

Clustering Clustering is an unsupervised classification method, i.e. unlabeled data is partitioned

10701 Semi supervised learning Can Unlabeled Data improve supervised learning? Important

Word2Vec Michael Collins, Columbia University Motivation We can easily collect very large

Veterinary Surgery Dominique J Griffon DVM, DECVS, DACVS, MS, PhD Editor-in-Chief Content

Learning from Limited Labeled Data (but a lot of unlabeled data) NELL as a case study Tom M.

Motivation Both human- and computer-generated programs sometimes contain data-flow anomalies .

Principal Component Analysis 4/7/17 PCA: the setting Unsupervised learning Unlabeled data

Classification from Pairwise Similarity and Unlabeled Data Han Bao 1,2 , Gang Niu 2 , Masashi

Co-Training Based on Combining Labeled and Unlabeled Data with Co-Training by A. Blum

Principal Component Ananalysis 4-8-2016 PCA: the setting Unsupervised learning Unlabeled

Anomalies in Data Maximilian Toller KDDM2 Maximilian Toller, Know-Center &gt; www.tugraz.at 1

K-Means Clustering 3/3/17 Unsupervised Learning We have a collection of unlabeled data

UnTran: Recognizing Unseen Activities with Unlabeled data using Transfer Learning ACM/IEEE

BCS Cumbria 16 th May 2019 1 In Introductions Steve Lawless CEO Purple Griffon Dr

Mining Adaptively Frequent Closed Unlabeled Rooted Trees in Data Streams Albert Bifet and Ricard

A Framework for Learnig Predictive Structures from Multiple Tasks and Unlabeled Data Rie Kubota

Classification from Positive, Unlabeled and Biased Negative Data Poster #180 Yu-Guan Hsieh 1 ,

Pretraining Sentiment Classifiers with Unlabeled Dialog Data Jul. 18, 2018 Toru Shimizu *1 ,

Mining Anomalies Andrzej Wasylkowski 1 Why Mine Anomalies? How can we make programs more

Overcoming Catastrophic Forgetting with Unlabeled Data in the Wild Presenters: Nikhil Kannan, Ying

Anticipating Visual Representations from Unlabeled Data Carl Vondrick, Hamed Pirsiavash, Antonio

Learning from Unlabeled Data INFO-4604, Applied Machine Learning University of Colorado Boulder

Combining Labeled and Unlabeled Data in Statistical Natural Language Parsing Simon Fraser

Evaluation of Multi-Terminology Super-Concepts for Information Retrieval Griffon N a , Soualmia

Anomalies in Data Maximilian Toller KDDM2 Maximilian Toller, Know-Center > www.tugraz.at 1