Automatic Virtual Machine Clustering based on Bhattacharyya - PowerPoint PPT Presentation

Automatic Virtual Machine Clustering based on Bhattacharyya Distance for Multi-Cloud Systems C. Canali R. Lancellotti University of Modena and Reggio Emilia MultiCloud - 22 april 2013 - Prague 1

Cloud computing challenges ● Large data centers (> 10 5 VMs) → huge amount of data ● Multiple data centers → geographic data exchange → Scalability problems ● ● Current approach reduce amount of data in a uniform way: – Reduce sampling frequency – Reduce number of metrics considered → Reduced monitoring effectiveness ● – Less information available to take management decision MultiCloud - 22 april 2013 - Prague 2

Reference scenario ● IaaS with long term commitment Geographic links ● Reactive VM relocation – Local scope – Overload mgm ● Periodic global consolidation – Global scope – Server mgm MultiCloud - 22 april 2013 - Prague 3

Impact on monitoring scalability ● Methodology: Data samples – Define quantitative model for VM (time series) behavior Extract quantitiative – Define VM similarity (dist. matrix) Model of VM behavior – Cluster similar VM together Histogram ● Elect a few (e.g., 3) cluster representatives Compute similarity between VM behavior ● Fine-grained monitoring of cluster representatives Distance matrix ● Reduced monitoring applied to other VMs Clustering – Reduced number of metrics Clustering – Lower sampling frequency solution MultiCloud - 22 april 2013 - Prague 4

Impact on monitoring scalability ● Case study: – E-health, Web-based application – Deployed on cloud IaaS ● Numeric example: – 110 VMs, K metrics, sampling frequency: 5 min. 4 K samples/day → ~3.2 10 – 2 classes, 3 rep. per class 3 K samples/day → ~2.1 10 → Monitoring data reduced ● by 1 order of magnitude MultiCloud - 22 april 2013 - Prague 5

Modeling VM behavior ● Model based on probability distribution of resource usage Data samp. – Multiple resources considered (metrics) ● Histogram for every metric, every VM VM behavior – Normalized histogram (∑h=1) Hist – B: number of buckets (critical) Similarity Dist. Mat. Clustering Clust. solution MultiCloud - 22 april 2013 - Prague 6

Defining VM similarity ● Use of Bhattacharyya distance – Determine distance matrix for each Data samp. couple of VMs, each metric VM behavior ● Euclidean combination of distance matrices Hist – Sum of squares of multiple distances Similarity Dist. Mat. Clustering Clust. solution MultiCloud - 22 april 2013 - Prague 7

Clustering algorithm ● Use of spectral clustering algorithm – Input: Square, symmetric distance Data samp. matrix VM behavior – Output: Cluster ID for every VM ● Additional feature: Hist – Number of clusters can be automatically determined through Similarity spectral gap analysis ● Open problems: Dist. Mat. – Is it correct to consider every metric Clustering together? – Is there a way to select the right Clust. solution metrics? MultiCloud - 22 april 2013 - Prague 8

Choosing the right metrics ● Multiple metrics are merged into the final distance matrix ● Not every metric provide significant information ● Proposal to identify relevant metrics – Consider auto-correlation: ACF decreasing rapidly → random variations – Consider Coefficient of Variation: CF » 1 → spiky and noisy behavior CF « 1 → little information provided → Merge information from metrics with ● – ACF decreasing slowly – CF ~ 1 MultiCloud - 22 april 2013 - Prague 9

Case study ● IaaS cloud supporting e-health – Web server and DBMS – 110 VMs – 10 metrics for each VM, – Sampling frequency: 5 min ● Goal: separate Web servers and DBMS – Main metric: Purity of clustering ● Three types of analyses – Impact of time series length – Impact of metric selection techniques – Impact of histogram characteristics MultiCloud - 22 april 2013 - Prague 10

Impact of time series length MultiCloud - 22 april 2013 - Prague 11

Impact of metric selection (1) Network I/O Mem paging # of procs. MultiCloud - 22 april 2013 - Prague 12

Impact of metric selection (2) MultiCloud - 22 april 2013 - Prague 13

Impact of histogram characteristics MultiCloud - 22 april 2013 - Prague 14

Conclusion and future work ● Scalability in (multi)cloud systems → open issue ● Proposal of novel methodology to improve scalability through clustering of similar VMs ● Experimental results are encouraging – Purity >0.83 even for very short time series ● Future research directions: – Validation with more data set (Help!) – Improving stability of the results w.r.t histogram parameters – Evaluate different models for VM behavior – Application of clustering to improve scalability of VM management MultiCloud - 22 april 2013 - Prague 15

Automatic Virtual Machine Clustering based on Bhattacharyya Distance for Multi-Cloud Systems C. Canali R. Lancellotti University of Modena and Reggio Emilia MultiCloud - 22 april 2013 - Prague 16

Automatic Virtual Machine Clustering based on Bhattacharyya - PowerPoint PPT Presentation

Automatic Virtual Machine Clustering based on Bhattacharyya Distance for Multi-Cloud Systems C. Canali R. Lancellotti University of Modena and Reggio Emilia MultiCloud - 22 april 2013 - Prague 1 Cloud computing challenges Large data

Graph Clustering Graph Clustering What is clustering? What is clustering? Finding patterns

Subspace Clustering Ensemble Clustering Subspace Clustering, Ensemble Clustering, Alternative

Trust based Clustering for Group Trust based Clustering for Group Trust based Clustering for

Introduction to Machine Learning, Clustering and EM Barnab s P czos Contents Clustering

Dependency Dependency- -Based Automatic Evaluation Based Automatic Evaluation Dependency

Evolutionary Clustering Presenter: Lei Tang Evolutionary Clustering Evolutionary Clustering

Clustering A Categorization of Major Clustering Methods Partitioning Methods

Finding Clusters Types of Clustering Approaches: Linkage Based, e.g. Hierarchical Clustering

CLR CLR What What is is a a virtual virtual machine machine? ? A new new virtual

Machine Learning 2 DS 4420 - Spring 2018 From clustering to EM Byron C. Wallace Clustering

Automatic Verification of Automatic Verification of Automatic Verification of Automatic

Clustering kMeans, Expectation Maximization, Self-Organizing Maps Outline K-means

The Java Virtual Machine The Java Virtual Machine interpret compile Native Binary Code Michael

Clustering: Hierarchical Clustering and K- Means Clustering Machine

Clustering Hierarchical clustering and k-mean clustering Genome 373 Genomic Informatics

Cl Clustering t i A Categorization of Major Clustering Methods Partitioning Methods

Automatic Generation of Oracles for Exceptional Behaviors a c f i t t r A * o m C p

Automatic Inference of High-Level Network Intents by Mining Forwarding Patterns Ali Kheradmand

Timing Behavior Anomaly Detection for Automatic Failure Detection and Diagnosis Research visit at

3. Behavioral Perspective of Learning Behavior: Big Questions Is learning just a change of

Semi-automatic Assessment of I/O Behavior An Explorative Study on 10 6 Jobs SC19-PDSW November

so social cial-cogn cognitiv itive e pr proc ocesses esses Presentation by Harmanjit

from Structured Natural Language Specifications http://swami.cs.umass.edu Manish Motwani Yuriy

Parallelization of Utility Programs Based on Behavior Phase Analysis Xipeng Shen Chen Ding

Explore More Topics

Sambuz

Useful Links

Newsletter

Mail Us