Evaluation of a Failure Prediction Model for Large Scale Cloud - PowerPoint PPT Presentation

Evaluation of a Failure Prediction Model for Large Scale Cloud Applications Mohammad S. Jassas and Qusay H. Mahmoud Presentation at Canadian AI 2020

Introduction ▪ Cloud services → Complexity for cloud architectures. ▪ Cloud applications have a high probability of failures ▪ Most Cloud providers have experienced failure in one of their services ▪ AWS experienced failure in (EBS) [7]. ▪ Many organizations are planning to use public cloud environments. ▪ Cloud providers → Maintaining their services to provide cloud consumers with a high level of QoS). 2 [7] P. Marshall, K. Keahey, T. Freeman, Elastic site: Using clouds to elastically extend site resources, in: 10th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid 2010).

Problem Statement • Providing 24x7 services uptime become one of the most significant challenges faces the cloud providers. • Failed jobs consume a notable amount of computational resources and memory. 3

Objective High Resource wastage Reliability + Availability Decrease the number of failed tasks Increase Minimize Time + Cost the performance of Cloud apps 4

Related Work ▪ Failure analysis and characterization have been studied widely in grid computing, cloud cluster and supercomputer [1]. ▪ The Google traces [3] are used in different research studies, including workload characterization [5] and applying statistical methods. ▪ In [2], we have studied the workload features such as memory usage, CPU speed, disk space. ▪ Limited research has been done on failure prediction [4,5,6]. ▪ El-Sayed et al. [4] have designed a job failure prediction model using a RF classifier. [1] Chen, X., Lu, C.D., Pattabiraman, K.: Failure analysis of jobs in compute clouds: a Google cluster case study. IEEE (2014). [4] El-Sayed, N., Zhu, H., Schroeder, B.: Learning from failure across multiple clusters: a trace-driven approach to [2] Jassas, M., Mahmoud, Q.H.: Failure analysis and characterization of scheduling jobs in Google cluster trace. IEEE (2018) understanding, predicting, and mitigating job terminations. IEEE (2017) 5 [3] Reiss, C., Wilkes, J., Hellerstein, J.L.: Google cluster-usage traces: format+ schema. Google Inc., (2011) [5] Chen, X., Lu, C.D., Pattabiraman, K.: Failure analysis of jobs in compute clouds: a Google cluster case study. IEEE (2014) [6] Ros, A., Chen, L.Y., Binder, W.: Predicting and mitigating jobs failures in big data clusters. In: 2015 15th IEEE/ACM (2015)

Proposed Solution 6

Experiments and Evaluation Results ▪ Trace Description (Google and LANL) ▪ Experimental Setup ▪ scikit-learn → ML packages in python ▪ Microsoft Azure → Google trace has large volumes of data requiring HPC nodes for analysis and prediction. 7 google/cluster-data: Borg cluster traces from Google - GitHub The Atlas Cluster Trace Repository | USENIX

Experiments and Evaluation Results ▪ Classifiers and Prediction Techniques Fig.5. Performance evaluation of different algorithms applied to the Google trace 8

Experiments and Evaluation Results Fig. 6. Performance evaluation of different algorithms applied to the Mustang and Trinity Traces 9

Experiments and Evaluation Results ▪ Feature Selection Algorithms 10

Conclusion and Future Work • Developing a prediction model for failed jobs based on ML methods. • Detecting failed jobs before the cloud management system schedules them. • Increasing the reliability and availability of the job cloud execution. • Applying different classification algorithms to various workload traces. • In future work, we will develop the proposed model using a deep learning approach to improve the accuracy. • Besides, future research will consider mitigation policies and techniques. 11

Mohammad S. Jassas Qusay H. Mahmoud qusay.mahmoud@ontariotechu.net mohammad.jassas@ontariotechu.net

Evaluation of a Failure Prediction Model for Large Scale Cloud - PowerPoint PPT Presentation

Evaluation of a Failure Prediction Model for Large Scale Cloud Applications Mohammad S. Jassas and Qusay H. Mahmoud Presentation at Canadian AI 2020 Introduction Cloud services Complexity for cloud architectures. Cloud applications

Health Failure Telehealth Final Report Sarah Briggs Heart Failure Specialist Nurse Heart Failure

k -Step Ahead Prediction Error Model 1. k -Step Ahead Prediction Error Model 1. ARMAX model is

Failure is a four-letter word Andreas Zeller Thomas Zimmermann Christian Bird PROMISE

Structured Prediction Introduction What is structured prediction? CS 6355: Structured Prediction

Branch Prediction Branch Prediction vs vs Execution Time Execution Time Prediction

Chapter 12. Evaluation Research Chapter 12. Evaluation Research evaluation research? evaluation

User Interface Evaluation Empirical evaluation Heuristic evaluation 1 CS 349 - UI evaluation

Model Evaluation Model Evaluation Metrics for Performance Evaluation How to evaluate the

PALLIATIVE CARE Advanced heart failure Heart failure has a poor prognosis Heart failure

Management of Co- morbidities in Heart Failure (COPD, Renal failure, Anemia) Dr John Parissis,

Static Failure Mode Prediction Models Mechanically Fastened Joints Joining & Assembly Static

Error Log Processing for Accurate Failure Prediction Felix Salfner Steffen Tschirpke ICSI

Using lasso and related estimators for prediction Di Liu StataCorp July 12, 2019 1 / 20

Prediction and Odds 18.05 Spring 2017 Probabilistic Prediction Also called probabilistic

Using Stata 16s lasso features for prediction and inference Di Liu StataCorp 1 / 50

CS 104 Computer Organization and Design Branch Prediction CS104:Branch Prediction 1 Branch

AIRFOILS Shishir Damani Mechanical Engineering Department NIT Tiruchirappalli AE-705

Genesis of Java Soheil Hassas Yeganeh Computer Engineering Department Sharif University of

Fall 2016 Incoming Freshmen Presentation Department of Physics & Astronomy Agenda

Kick-Off Meeting August 24 th , 2011 John H.L. Hansen, EE Dept. Head Andrea Fumagalli, TE Program

Section 18.3 Learning Decision Trees CS4811 - Artificial Intelligence Nilufer Onder Department

Data Access for Data Science April 17, 2018 Ja Jacques Nadeau Co-Founder & CTO, Dremio PMC

Hello! Im Ashleigh Weeden . I ask a lot of questions, talk pretty fast and I care about

REIMAGINING RURAL FUTURES Hello! Im Ashleigh Weeden PhD Candidate - Rural Studies School of

Sambuz

Useful Links

Newsletter

Mail Us

Evaluation of a Failure Prediction Model for Large Scale Cloud - PowerPoint PPT Presentation

Evaluation of a Failure Prediction Model for Large Scale Cloud Applications Mohammad S. Jassas and Qusay H. Mahmoud Presentation at Canadian AI 2020 Introduction Cloud services Complexity for cloud architectures. Cloud applications

Health Failure Telehealth Final Report Sarah Briggs Heart Failure Specialist Nurse Heart Failure

k -Step Ahead Prediction Error Model 1. k -Step Ahead Prediction Error Model 1. ARMAX model is

Failure is a four-letter word Andreas Zeller Thomas Zimmermann Christian Bird PROMISE

Structured Prediction Introduction What is structured prediction? CS 6355: Structured Prediction

Branch Prediction Branch Prediction vs vs Execution Time Execution Time Prediction

Chapter 12. Evaluation Research Chapter 12. Evaluation Research evaluation research? evaluation

User Interface Evaluation Empirical evaluation Heuristic evaluation 1 CS 349 - UI evaluation

Model Evaluation Model Evaluation Metrics for Performance Evaluation How to evaluate the

PALLIATIVE CARE Advanced heart failure Heart failure has a poor prognosis Heart failure

Management of Co- morbidities in Heart Failure (COPD, Renal failure, Anemia) Dr John Parissis,

Static Failure Mode Prediction Models Mechanically Fastened Joints Joining &amp; Assembly Static

Error Log Processing for Accurate Failure Prediction Felix Salfner Steffen Tschirpke ICSI

Using lasso and related estimators for prediction Di Liu StataCorp July 12, 2019 1 / 20

Prediction and Odds 18.05 Spring 2017 Probabilistic Prediction Also called probabilistic

Using Stata 16s lasso features for prediction and inference Di Liu StataCorp 1 / 50

CS 104 Computer Organization and Design Branch Prediction CS104:Branch Prediction 1 Branch

AIRFOILS Shishir Damani Mechanical Engineering Department NIT Tiruchirappalli AE-705

Genesis of Java Soheil Hassas Yeganeh Computer Engineering Department Sharif University of

Fall 2016 Incoming Freshmen Presentation Department of Physics &amp; Astronomy Agenda

Kick-Off Meeting August 24 th , 2011 John H.L. Hansen, EE Dept. Head Andrea Fumagalli, TE Program

Section 18.3 Learning Decision Trees CS4811 - Artificial Intelligence Nilufer Onder Department

Data Access for Data Science April 17, 2018 Ja Jacques Nadeau Co-Founder &amp; CTO, Dremio PMC

Hello! Im Ashleigh Weeden . I ask a lot of questions, talk pretty fast and I care about

REIMAGINING RURAL FUTURES Hello! Im Ashleigh Weeden PhD Candidate - Rural Studies School of

Sambuz

Useful Links

Newsletter

Mail Us

Static Failure Mode Prediction Models Mechanically Fastened Joints Joining & Assembly Static

Fall 2016 Incoming Freshmen Presentation Department of Physics & Astronomy Agenda

Data Access for Data Science April 17, 2018 Ja Jacques Nadeau Co-Founder & CTO, Dremio PMC