service in linux kernel
play

service in Linux Kernel Vikas Shivappa - PowerPoint PPT Presentation

Introduction to Cache Quality of service in Linux Kernel Vikas Shivappa (vikas.shivappa@linux.intel.com) 1 Agenda Problem definition Existing techniques Why use Kernel QOS framework Intel Cache qos support Kernel


  1. Introduction to Cache Quality of service in Linux Kernel Vikas Shivappa (vikas.shivappa@linux.intel.com) 1

  2. Agenda • Problem definition • Existing techniques • Why use Kernel QOS framework • Intel Cache qos support • Kernel implementation • Challenges • Performance improvement • Future Work 2

  3. Without Cache QoS Low Pri High Pri apps apps C2 C1 C3 Cores Cores Low pri apps may get more cache Shared Processor Cache - Noisy neighbour => Degrade/inconsistency in response => QOS difficulties - Cache Contention with multi threading 3

  4. Agenda • Problem definition • Existing techniques • Why use Kernel QOS framework • Intel Cache qos support • Kernel implementation • Challenges • Performance improvement • Future Work 4

  5. Existing techniques • Mostly heuristics on real systems • No methodology to identify cache lines belonging to a particular thread • Lacks configurability by OS 5

  6. Agenda • Problem definition • Existing techniques • Why use Kernel QOS framework • Intel Cache qos support • Kernel implementation • Challenges • Performance improvement • Future Work 6

  7. Why use the QOS framework? • Lightweight powerful tool to manage cache Threads • Without a lot of architectural details Architectural details of ID management/scheduling 7

  8. With Cache QoS Low Pri High Pri apps apps User space Kernel space Kernel Cache QOS framework h/w Intel QOS h/w support Controls to allocate the Proc appropriate cache to Cache high pri apps - Help maximize performance and meet QoS requirements - In Cloud or Server Clusters - Mitigate jitter/inconsistent response times due to ‘Noisy neighbour’ 8

  9. Agenda • Problem definition • Existing techniques • Why use Kernel QoS framework • Intel Cache QoS support • Kernel implementation • Challenges • Performance improvement • Future Work 9

  10. What is Cache QoS ? • Cache Monitoring – cache occupancy per thread – perf interface • Cache Allocation – user can allocate overlapping subsets of cache to applications – cgroup interface 10

  11. Cache lines  Thread ID (Identification) • Cache Monitoring – RMID (Resource Monitoring ID) • Cache Allocation – CLOSid (Class of service ID) 11

  12. Representing cache capacity in Cache Allocation(example) Capacity Bn B1 B0 Bitmask W(k-1) Wk W3 W2 W1 W0 Cache Ways - Cache capacity represented using ‘Cache bitmask’ - However mappings are hardware implementation specific 12

  13. Bitmask  Class of service IDs (CLOS) Default Bitmask – All CLOS ids have all cache B7 B6 B5 B4 B3 B2 B1 B0 CLOS0 A A A A A A A A CLOS1 A A A A A A A A CLOS2 A A A A A A A A CLOS3 A A A A A A A A Overlapping Bitmask (only contiguous bits) B7 B6 B5 B4 B3 B2 B1 B0 CLOS0 A A A A A A A A CLOS1 A A A A CLOS2 A A CLOS3 A A 13

  14. Agenda • Problem definition • Existing techniques • Why use Kernel QOS framework • Intel Cache qos support • Kernel implementation • Challenges • Performance improvement • Future Work 14

  15. Kernel Implementation User interface Threads /sys/fs/cgroup perf User Space Kernel Space Allocation Read Monitored During ctx Kernel QOS support configuration switch data MSR cache Configure Set Read Cache alloc bitmask per CLOS/RMID Event monitoring CLOS for thread counter Cgroup fs Hardware Intel Xeon QOS support Shared L3 Cache 15

  16. Usage Monitoring per thread cache occupancy in bytes Allocating Cache per thread through cache bitmask Cgroup Clos : Parent.Clos Exposed to bitmask : Parent.bitmask user land Tasks : Empty 16

  17. Scenarios • Units that can be allocated cache – Process/tasks – Virtual machines (transfer all PIDs of VM to one cgroup) – Containers (put the entire container into one cgroup) • Restrict the noisy neighbour • Fair cache allocation to resolve cache contention 17

  18. Agenda • Problem definition • Existing techniques • Why use Kernel QOS framework • Intel Cache qos support • Kernel implementation • Challenges • Performance improvement • Future Work 18

  19. Challenges • Openstack usage • What if we run out of IDs ? • What about Scheduling overhead • Doing monitoring and allocation together 19

  20. Openstack usage Applications Openstack dashboard Integration WIP Compute Network Storage Shared L3 Cache Shared L3 Cache Open Stack Services Standard hardware 20

  21. Openstack usage … Work beginning, not stable yet to add changes to Ceilometer (With Qiaowei qiaowei.ren@intel.com ) OpenStack Virt mgr ovirt . . . libvirt Perf syscall Kernel . . . KVM Xen Cache QOS 21

  22. What if we run out of IDs ? • Group tasks together (by process?) • Group cgroups together with same mask • return – ENOSPC • Postpone 22

  23. Scheduling performance • msrread/write costs 250-300 cycles • Keep a cache. Grouping helps ! • Don’t use till user actually creates a new cache mask 23

  24. Monitor and Allocate • RMID(Monitoring) CLOSid(allocation) different • Monitoring and allocate same set of tasks easily – perf cannot monitor the cache alloc cgroup(?) 24

  25. Agenda • Problem definition • Existing techniques • Why use Kernel QOS framework • Intel Cache qos support • Kernel implementation • Challenges • Performance improvement and Future Work 25

  26. Performance Measurement • Intel Xeon based server, 16GB RAM • 30MB L3 , 24 LPs • RHEL 6.3 • With and without cache allocation comparison • Controlled experiment – PCIe generating MSI interrupt and measure time for response – Also run memory traffic generating workloads (noisy neighbour) • Experiment Not using current cache alloc patch 26

  27. Performance Measurement [1] 1.5x 2.8x 1.3x - Minimum latency : 1.3x improvement , Max latency : 1.5x improvement , Avg latency : 2.8x improvement - Better consistency in response times and less jitter and latency with the noisy 27 neighbour

  28. Patch status Cache Monitoring Upstream 4.1 (Matt Fleming ,matt.fleming@intel.com) Cache Allocation Under review. (Vikas Shivappa , vikas.shivappa@intel.com) Code Data prioritization Under review. (Vikas Shivappa , vikas.shivappa@intel.com) Work started (Qiaowei Open stack integration (libvirt update) qiaowei.ren@intel.com) 28

  29. Future Work • Performance improvement measurement • Code and data allocation separately – First patches shared on lkml • Monitor and allocate same unit • Openstack integration • Container usage 29

  30. Acknowledgements • Matt Fleming (cache monitoring support, Intel SSG) • Will Auld (Architect and Principal engineer, Intel SSG) • CSIG, Intel 30

  31. References • [1] http://www.intel.com/content/www/us/en/co mmunications/cache-allocation-technology- white-paper.html 31

  32. Questions ? 32

Recommend


More recommend