soft container
play

SOFT CONTAINER TOWARDS 100% RESOURCE UTILIZATION ACCELA ZHAO, LAYNE - PowerPoint PPT Presentation

SOFT CONTAINER TOWARDS 100% RESOURCE UTILIZATION ACCELA ZHAO, LAYNE PENG 1 WHO ARE THOSE GUYS Accela Zhao, Technologist at EMC OCTO, active Openstack community contributor, experienced in cloud scheduling and container technologies. Mail:


  1. SOFT CONTAINER TOWARDS 100% RESOURCE UTILIZATION ACCELA ZHAO, LAYNE PENG 1

  2. WHO ARE THOSE GUYS … Accela Zhao, Technologist at EMC OCTO, active Openstack community contributor, experienced in cloud scheduling and container technologies. Mail: accela.zhao@emc.com Layne Peng, Principal Technologist at EMC OCTO, experienced cloud architect, one of the earliest contributors to Cloud Foundry in China, 9 patents owner and a book author. Mail: layne.peng@emc.com Twitter: @layne_peng 2

  3. WHAT IS RESOURCE UTILIZATION? This is what we buy A gap of $$$ wasted This is what we use 3

  4. ENERGY AND RESOURCE UTILIZATION Real world resource utilization is usually low: around 20% or less An idle server consumes even 70% as much energy as running in full- speed Energy-related costs 42% of total (including buy new machines) Low resource utilization is energy inefficient Waste energy, waste money 4

  5. A CLOSER LOOK TO CLOUD The key advantage - cloud consolidation Improved resource utilization Less machines, more apps. Energy- efficient and saves money. 5

  6. RESOURCE UTILIZATION ON CLOUD • Scheduling - choose the best resource placement when app starts – Examples: Green Cloud, Paragon. And the schedulers in Openstack, Kubernetes, Mesos , … • Migration - continuously optimize the resource placement when app is running – Examples: Openstack Watcher, VMware DRS • Soft Container - elastic, and dynamically adjust Soft Container resource constraints in response to co-located apps – Related: Google Heracles 6

  7. RESOURCE UTILIZATION ON CLOUD Apps Manages resource Scheduler utilization at app kick-off Soft Container Manages resource Migration utilization cross hosts Manages resource while app running utilization at fine granularity inside host 7

  8. RESOURCE UTILIZATION ON CLOUD A battle of putting more apps in each host vs. guaranteed app SLA The key problem: resource interference 8

  9. THE KEY PROBLEM: RESOURCE INTERFERENCE • What is resource interference? – Apps co-located in one host share resources like CPU, cache, memory, … – They interfere with each other, result in poor performance compared to running standalone – Resource interference make SLA unenforceable • Related readings – Google Heracles: an analysis of resource interference – Paragon: resource interference-aware scheduling – Bubble-up: to measure resource interference 9

  10. RESOURCE INTERFERENCE: HOW IT LOOKS? MySQL standalone running vs co-located with a CPU & disk hungry task 10

  11. RESOURCE INTERFERENCE: HOW TO MEASURE? • Bubble-up – The setup • Run app co-located with resource benchmarks, each benchmark stresses one type of resource – App tolerated resource interference • Slowly increase resource benchmark stress until app fails its SLA. • The critical point shows how much resource interference the app can tolerate. – App caused resource interference • Run app at what its SLA requires. • The stress it causes on each type of resource is the app’s caused resource interference. • Where to use it? – Better resource utilization management – Scheduling, Migration, S oft Container, … 11

  12. RESOURCE INTERFERENCE: HOW TO MEASURE? MySQL standalone running, vs co-located with CPU stress, vs disk stress. In my case, MySQL is much more sensitive to CPU interference. 12

  13. INTRODUCING TO SOFT CONTAINER • Motivations – Increase resource utilization by co-locating more apps • E.g. Business services is critical but may not use all resources on the host. Add the low priority hadoop batching tasks to fill what is left. – Respond to the dynamic nature of time-varying workload • E.g. Business service may become more idle at lunch time, hadoop tasks can then expand its resource bubble and utilize the leftover. – Guarantee the SLA of critical apps • E.g. When the business service suddenly requires more resource for processing, hadoop tasks will shrink instantly to give out resources. • Challenges – Resource control and isolation of interference – Respond to dynamic workload change 13

  14. INTRODUCING TO SOFT CONTAINER • What does “Soft” mean? – Varying container resources needs based upon neighbors and SLAs. (The container becomes elastic) – “Expanding” (bubble up) resources when idle resources exist – Shrinking resources on a specific container, when another critical app demands more resources Resource Container resource bubble Time 14

  15. THE FEEDBACK CONTROL LOOP Controller Soft Container Watcher Limiter Containers 15

  16. RESOURCES TO LIMIT • CPU • Memory • Disk I/O – Core – Size – IOPS – Time Quota – Bandwidth – Throughput – … – … – … 16

  17. RESOURCES TO LIMIT - MISSING • CPU • Memory • Disk I/O – Core – Size – IOPS – Time Quota – Bandwidth* – Throughput – … – … – … • Cache • Network – LLC – Ulimit – … – Bandwidth – … • GPU • Device* – … – … … Kernel 3.6, most supports can be found in the community… 17

  18. ISOLATION THE RESOURCES - NAMESPACE clone(): create a new process and attached to a new namespace • unshare(): create a new namespace and attaches to a existed process • setns(): Set a a process to a existing namespace • /proc/<pid>/ns: lrwxrwxrwx 1 root root 0 Jun 21 18:38 ipc -> ipc:[4026532509] • lrwxrwxrwx 1 root root 0 Jun 21 18:38 mnt -> mnt:[4026532507] • lrwxrwxrwx 1 root root 0 Jun 16 18:24 net -> net:[4026532512] • lrwxrwxrwx 1 root root 0 Jun 21 18:38 pid -> pid:[4026532510] • lrwxrwxrwx 1 root root 0 Jun 21 18:38 user -> user:[4026531837] • lrwxrwxrwx 1 root root 0 Jun 21 18:38 uts -> uts:[4026532508] • We are still waiting … security namespace • security keys namespace • device namespace • time namespace • 18

  19. LIMIT THE RESOURCE - CGROUP Task, Control Group & Hierarchy Subsystem – Control options blkio freezer • • cpu memory • • cpuacct net_cls • • cpuset net_prio • • devices ns • • Usage Create a cgroup subsystem Change the limitation … # echo 524288000 > /sys/fs/cgroup/memory/foo/memory.limit_in_b ytes 19

  20. MISSING - NETWORK Isolation, does not means resource controlling Suppose two containers in a machine, totally 100Gbps b/w 100Gbps 10 80 20

  21. MISSING - NETWORK Isolation, does not means resource controlling Suppose two containers in a machine, totally 100Gbps b/w 100Gbps 10 80 If the GREEN container consumes the majority of b/w, which may have a negative impact on the BLUE one … How we can avoid this case from happening? 100Gbps 95 21

  22. MISSING - NETWORK Nightmare of the PaaS providers … Community attempts: Base on Traffic Control (tc) 22

  23. MISSING - NETWORK Nightmare of the PaaS providers … Community attempts: Base on Traffic Control (tc) 23

  24. MISSING - GPU Nvidia’s efforts: a. GPU exposed as separated normal devices in /dev b. devices cgroup: Allow/Deny/List • Access • i. R ii. W iii. M Ref: https://github.com/NVIDIA/nvidia-docker/wiki/GPU-isolation 24

  25. MISSING - GPU Nvidia’s efforts: a. GPU exposed as separated normal devices in /dev b. devices cgroup: Allow/Deny/List • Access • i. R ii. W iii. M Usable, but insufficient … 1. Launch multiple jobs in parallel, each one us a subset of avaiable GPUs; 2. How about share GPU between Jobs with proper isolation? Can we share a GPU like we can a CPU? Ref: https://github.com/NVIDIA/nvidia-docker/wiki/GPU-isolation 25

  26. MISSING - CACHE Intel’s efforts: Cache Allocation Technology (CAT) Cache Monitor Technology (CMT) • The ability to enumerate the CAT capability and For an OS or VMM to indicate a software- • the associated LLC allocation support via defined ID for each of applications or VMs that CPUID. are scheduled to run on a core. This ID is called • Interfaces for the OS/hypervisor to group the Resource Monitoring ID (RMID). applications into classes of service (CLOS) and To Monitor cache occupancy on a per RMID • indicate the amount of last-level cache basis available to each CLOS. These interfaces are For an OS or VMM to read LLC occupancy for a • based on MSRs (Model-Specific Registers). given RMID at any time. Code and Data Prioritization (CDP) Extension to CAT • a new CPUID feature flag is added within the • CAT sub-leaves at CPUID.0x10.[ResID=1]:ECx[bit 2] to indicate support 26

  27. MISSING – MEMORY BANDWIDTH Monitor Memory Bandwidth Monitoring (MBM) Mechanisms in hardware to monitor cache • occupancy and bandwidth statistics as applicable to a given product generation on a per software-id basis. Mechanisms for the OS or hypervisor to read • back the collected metrics such as L3 occupancy or Memory Bandwidth for a given software ID at any point during runtime. Control Ref Memory Bandwidth Management for Efficient Performance Isolation in Multi-core Platform: http://pertsserver.cs.uiuc.edu/~mcaccamo/papers/private/IEEE_TC_journal_submitted_C.pdf Code: https://github.com/heechul/memguard 27

Recommend


More recommend