Capacity Allocation for Big Data Applications in the Cloud 27 th April 2017 QUDOS 2017@ICPE Workshop, L’Aquila Michele Ciavotta Eugenio Gianniti Danilo Ardagna DICE Horizon 2020 Project Grant Agreement no. 644869 Funded by the Horizon 2020 http://www.dice-h2020.eu Framework Programme of the European Union
Outline o Background and motivations o D-SPACE4Cloud Tool o Experimental results o Conclusions and future work Danilo Ardagna
Background o Data intensive applications (DIAs) hosted on public Clouds o The goal is to optimize resource allocation at design time, taking into account quality of service constraints Danilo Ardagna
D-SPACE4Cloud Tool Innovation : The problem : o Design space exploration has o Minimize costs and suggest the optimal deployment architecture been increasingly sought in traditional multi-tier applications, that provides QoS guarantees but not in the design of DIAs Impact & stakeholders: What does the tool do? o Designers and operators make o Automatic analysis of multiple more informed decisions about candidate alternative the technology to use configurations to identify the minimum cost one o Reduce costs of a shared cluster running multiple DIAs Danilo Ardagna
Reference System Danilo Ardagna
Complete Optimization Problem X min ( σ τ i s i + π τ i R i ) (P1a) x , ν , s , R i ∈ C X ( ν , s , R ) ∈ arg min ( σ τ i s i + π τ i R i ) (P1g) subject to: i ∈ C subject to: X x ij = 1 , ∀ i ∈ C (P1b) η i j ∈ V ∀ i ∈ C (P1h) s i ≤ R i , 1 − η i X P i, τ i = P ij x ij , ∀ i ∈ C (P1c) ν i = R i + s i , (P1i) ∀ i ∈ C j ∈ V T ( P i, τ i , ν i ; H i , Z i ) ≤ D i , ∀ i ∈ C (P1j) X σ τ i = σ j x ij , ∀ i ∈ C (P1d) ν i ∈ N , ∀ i ∈ C (P1k) j ∈ V ∀ i ∈ C (P1l) R i ∈ N , X π τ i = π j x ij , ∀ i ∈ C (P1e) (P1m) s i ∈ N , ∀ i ∈ C j ∈ V x ij ∈ { 0 , 1 } , ∀ i ∈ C , ∀ j ∈ V (P1f) o Many integer variables and constraints make the problem intractable with exact methods o We split the problem in two layers Danilo Ardagna
Local Search Motivations o The mathematical programming problem is written with a raw performance prediction formula o The optimum should also be accurate, hence we rely on simulation models o There is the need to explore the design space o The initial guess might turn out to be infeasible o The initial guess might be overprovisioned Danilo Ardagna
D-SPACE4Cloud Architecture Danilo Ardagna
Local Search Method o Apply hill climbing per class varying the VM allocation o Evaluate the optimal configuration returned by (P1) to choose the climbing direction o Remove instances if feasible o Add more VMs if infeasible o Stop after reaching the local optimum Danilo Ardagna
Simulation Models Validation o TPC-DS benchmark, datasets ranging from 250 GB to 1 TB o Experiments run on Amazon EC2, Cineca, Flexiant, with cluster sizes ranging from 20 to 240 cores o Overall, 27,000 CPU hours worth of experiments Danilo Ardagna
Optimal Cluster Cost R1 — H10 R1 — H20 0.9 2 0.8 1.5 Cost [ e /h] Cost [ e /h] 0.7 CINECA CINECA 0.6 1 m4.xlarge m4.xlarge 0.5 0.5 0.4 0.3 00 0 5e+06 1e+07 1.5e+07 2e+07 2.5e+07 3e+07 5e+06 1e+07 1.5e+07 2e+07 2.5e+07 3e+07 Deadline [ms] Deadline [ms] Danilo Ardagna
Conclusions o D-SPACE4Cloud minimizes the overall cost under QoS constraints o The tool supports a search technique to compare various providers and offerings o Since we rely on accurate simulation models, we can reasonably trust the optimal configuration returned Danilo Ardagna
Future Work o Exploit machine learning and insight on the problem to improve heuristics efficiency o Consider private or hybrid Clouds by adding capacity constraints o Address other technologies: Spark and Storm Danilo Ardagna
Thanks! www.dice-h2020.eu Danilo Ardagna
Recommend
More recommend