Auto-scaling deadline- constrained workloads in containers in the cloud Jay Jay DesLauriers DesLauriers Research Associate, University of Westminster
Project COLA • Horizon 2020 • 33 months • Completion September 2019 • 14 Partners in 6 Countries • 10 SME/Public Sector • 4 HE/Research Institutions This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 731574 June 5 th 2019 www.project-cola.eu 2
Head in the clouds On-Premise Off-Premise Capital Expense Pay-as-you-go High Upfront Cost No Upfront Cost High Maintenance Cost No Maintenance Cost June 5 th 2019 www.project-cola.eu 3
A match made in ... Containers Operating-system virtualisation and application packaging for reusable, portable software June 5 th 2019 www.project-cola.eu 4
The Problem Some requirements: Application 1 Application 2 Application N • Dynamic Supply (auto-scaling) Service 1 Service 2 Service 3 Service 4 Service 5 • Vendor–free Resource requirements • Modular Variable resource consumption Baseline resource consumption • Flexible Dynamic To be replaced by Manually demand automatically adjusted adjusted supply supply • Secure Cloud services June 5 th 2019 www.project-cola.eu 5
Finding a solution... June 5 th 2019 www.project-cola.eu 6
The Solution MiCADO MiCADO MASTER NODE WORKER NODE cAdvisor Submitter Policy Keeper Optimiser TOSCA ML based Translates Enforces Application optimisation ADT scaling Node Exporter Description Export VM/ Template container metrics (ADT) Occopus Kubernetes Prometheus Describes application, Orchestrate Orchestrate Monitor VMs Docker infrastructure, scaling containers VMs & containers policies, security policies Container Runtime June 5 th 2019 www.project-cola.eu 7
Scaling Use-Case No.1 • Resource intensive services • Typically CPU/memory –bound apps/services • Containers & underlying VMs scale to meet demand June 5 th 2019 www.project-cola.eu 8
Scaling Use-Case No.2 ... ? • Multi-job experiments • Typically batch/parameter sweep jobs • Containers/VMs scale to complete jobs by deadline MICADO • Where do we put the jobs? Container and Cloud • How do we execute them Orchestrator (in containers!) cqueue Policy cqueue Keeper Scale up/ worker worker Jobs (Scaling down ADT: logic) <insert queue here> MiCADO infrastructure R Submitter and scaling R MICADO MICADO jQueuer rules MASTER WORKER Agent Jobs June 5 th 2019 www.project-cola.eu 9
JQueuer • Asynchronous Distributed Task Queue • Master Component • Runs externally • Queue & monitoring • Agent Component MICADO Container and • Runs on worker VMs Cloud Orchestrator • Fetch & execute jobs cqueue Policy cqueue Keeper Scale up/ worker worker Jobs (Scaling down ADT: logic) MiCADO experiment infrastructure jQUEUER Submitter .json and scaling MASTER MICADO MICADO jQueuer rules MASTER WORKER Agent End user Jobs June 5 th 2019 www.project-cola.eu 10
JQueuer Metrics Metrics exported to MiCADO for scaling: Queue length Jobs completed Jobs failed Jobs running Jobs remaining Time elapsed Average job length Time to deadline June 5 th 2019 www.project-cola.eu 11
The experiment Determining the impact of changes in behavior on the spread of a disease across a population June 5 th 2019 www.project-cola.eu 12
Experiment design • Agent-based simulation • Repast Simphony • Three agents • Infected • Susceptible • Recovered • Simulate movement & interaction of agents in an environment June 5 th 2019 www.project-cola.eu 13
Manual job allocation (baseline) equal distribution 40x jobs per VM 200 jobs 1-hour to complete Repast Repast Repast Repast Repast all jobs 1 2 3 4 5 VM 1 VM 2 VM 3 VM 4 VM 5 June 5 th 2019 www.project-cola.eu 14
Automatic job allocation (MiCADO) JQueuer experiment.json Manager 1-hour 200 deadline jobs MiCADO Master Repast Repast Repast JQueuer JQueuer JQueuer n Agent 2 Agent 1 Agent MiCADO Worker n MiCADO Worker 2 MiCADO Worker 1 June 5 th 2019 www.project-cola.eu 15
Results • Dynamic allocation of variable length jobs results in a better use of cloud resources 5 VMs Manually Allocated Allocated by MiCADO Manually Allocated Allocated by MiCADO Manual allocation (baseline) 3.86 VMs Dynamic allocation (MiCADO) June 5 th 2019 www.project-cola.eu 16
Thanks! • github.com/micado-scale/ansible-micado • project-cola.eu/ • T. Kiss, J. DesLauriers, G. Gesmier et al., A cloud-agnostic queuing system to support the implementation of deadline-based application execution policies, Future Generation Computer Systems (2019), https://doi.org/10.1016/j.future.2019.05.062 Project Director: Dr. Tamas Kiss, University of Westminster, UK This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 731574 June 5 th 2019 www.project-cola.eu 17
Recommend
More recommend