stratus clouds with microarchitectural resource management
play

Stratus: Clouds with Microarchitectural Resource Management Kaveh - PowerPoint PPT Presentation

Stratus: Clouds with Microarchitectural Resource Management Kaveh Razavi and Animesh Trivedi Once Upon a Time in the Cloud Large: 4 cores, 16 GB Medium: 2 cores, 8 GB Small: 1 core, 4 GB 2 Once Upon a Time in the Cloud Large: 4 cores, 16


  1. Stratus: Clouds with Microarchitectural Resource Management Kaveh Razavi and Animesh Trivedi

  2. Once Upon a Time in the Cloud Large: 4 cores, 16 GB Medium: 2 cores, 8 GB Small: 1 core, 4 GB 2

  3. Once Upon a Time in the Cloud Large: 4 cores, 16 GB Medium: 2 cores, 8 GB cloud Small: 1 core, 4 GB provider allocate(small) CPU tenant DRAM 3

  4. Once Upon a Time in the Cloud Large: 4 cores, 16 GB Medium: 2 cores, 8 GB cloud Small: 1 core, 4 GB provider allocate(small) CPU tenant DRAM 4

  5. Once Upon a Time in the Cloud ) m u i d e m ( e t a c tenant o Large: 4 cores, 16 GB l l a Medium: 2 cores, 8 GB cloud Small: 1 core, 4 GB provider allocate(small) CPU tenant DRAM 5

  6. Once Upon a Time in the Cloud ) m u i d e m ( e t a c o Large: 4 cores, 16 GB l l a Medium: 2 cores, 8 GB cloud Small: 1 core, 4 GB provider allocate(small) CPU Shared L3 Cache What could possibly go wrong here when DRAM two tenant share the L3 cache? 6

  7. Problems with (Unsupervised) Sharing (a) Performance (b) Security CPU Shared L3 Cache NetCAT (S&P’20): detect activity dCat (EuroSys’18): 57.6% improvements of another tenant over network for Redis with noisy neighbours 7 Cong Xu, et al. DCat: dynamic cache management for efficient, performance-sensitive infrastructure-as-a-service . ACM EuroSys 2018.

  8. Problems with (Unsupervised) Sharing (a) Performance (b) Security CPU Shared L3 Cache NetCAT (S&P’20): detect activity dCat (EuroSys’18): 57.6% improvements of another tenant over network for Redis with noisy neighbours 8 Cong Xu, et al. DCat: dynamic cache management for efficient, performance-sensitive infrastructure-as-a-service . ACM EuroSys 2018. NetCAT: Practical Cache Attacks from the Network. Kurth, M.; Gras, B.; Andriesse, D.; Giuffrida, C.; Bos, H.; and Razavi, K. In S&P, 2020.

  9. Problems with (Unsupervised) Sharing (a) Performance (b) Security CPU Shared L3 Cache Are these two examples L3 Cache specific? NetCAT (S&P’20): detect activity dCat (EuroSys’18): 57.6% improvements of another tenant over network for Redis with noisy neighbours 9 Cong Xu, et al. DCat: dynamic cache management for efficient, performance-sensitive infrastructure-as-a-service . ACM EuroSys 2018. NetCAT: Practical Cache Attacks from the Network. Kurth, M.; Gras, B.; Andriesse, D.; Giuffrida, C.; Bos, H.; and Razavi, K. In S&P, 2020.

  10. New Classes of Microarchitectural Resources Microarchitectural Resource Resource CPU Caches, TLBs, Hyperthreads, ALUs Smart NICs Caches (memory, requests, connection), TLBs, RMT pipelines, DMA engines NVM Storage Blocks, pages, internal r/w ports, programmable cores, SRAM GPUs Memories, caches, execution units In-Network SRAM and TCAM memories, Match-action Switches Unit processors, ALUs And more - TPUs, FPGAs, near-memory compute elements … 10

  11. What is happening ● Diverse microarchitectural resources are here to stay ● They have security and performance ramifications ● The root cause is unsupervised sharing (or lack of isolation) Key challenge How to manage microarchitectural resources in a principled manner? 11

  12. Stratus: Clouds with Principled Microarchitectural Resource Management Key property : isolation 12

  13. Stratus: Clouds with Principled Microarchitectural Resource Management A cloud resource allocation framework Security and performance requirements are the two sides of isolation Captures and reasons about microarchitectural resource isolation 13

  14. Stratus: Clouds with Principled Microarchitectural Resource Management Q1: How to capture isolation? Q2: How to reason about isolation? Q3: How to charge for isolation? 14

  15. Stratus: Clouds with Principled Microarchitectural Resource Management Q1: How to capture isolation? Q2: How to reason about isolation? Q3: How to charge for isolation? 15

  16. Capturing Isolation: A Declarative Interface Isolation captured as handle = ISOLATE (resource, scale, quantity); constraints on resource allocations 16

  17. Capturing Isolation: A Declarative Interface Isolation captured as handle = ISOLATE (resource, scale, quantity); constraints on resource allocations hard : discrete allocation Extent of isolation Number of resources for LLC slots, ALUs, TLBs, requested {0,1} which this constraint must be satisfied soft: contented (in time) DRAM bandwidth 17

  18. Capturing Isolation: A Declarative Interface handle = ISOLATE (resource, scale, quantity); Multiple constraints can be ATTACH (handle1, handle2, ...); “attached” (AND) by their handles 18

  19. Capturing Isolation: A Declarative Interface handle = ISOLATE (resource, scale, quantity); ATTACH (handle1, handle2, ...); ALLOCATE cloud_resource, .. where constraints Pass multiple microarchitectural constraints during cloud resource allocations (also labeled grouped constraints -- see the paper) 19

  20. Capturing Isolation: A Declarative Interface handle = ISOLATE (resource, scale, quantity); ATTACH (handle1, handle2, ...); ALLOCATE cloud_resource, .. where constraints E.g., mitigating NetCAT: ALLOCATE small where h1 = ISOLATE (CPU.LLC, 1.0, 64), // the first 64 lines h2 = ISOLATE (NIC.*, *, ...), // all NIC-level uarch ATTACH (h1, h2); // both for small VM allocations 20

  21. Stratus: Clouds with Principled Microarchitectural Resource Management Q1: How to capture isolation? Q2: How to reason about isolation? Q3: How to charge for isolation? 21

  22. Building a Reasoning Framework Cloud Knowledge Base Resources - Topology, numbers, types ● Similar in the spirit as SKB - Datasheets information in Barrelfish OS (SOSP’09) CKB ● Structured representation of Online measurements knowledge in one place - Utilization, spare ● Can be queried - Occupancy 22

  23. Building a Reasoning Framework Cloud Knowledge Base Resources - Topology, numbers, types - Datasheets information Allocation strategy CKB Online measurements - Utilization, spare - Occupancy Tenant’s constraints (CNF format) 23

  24. Stratus: Clouds with Principled Microarchitectural Resource Management Q1: How to capture isolation? Q2: How to reason about isolation? Q3: How to charge for isolation? 24

  25. Charging for Isolation: Isolation Credits Isolation spectrum Low isolation High isolation + High utilization + Low interference + Better efficiency + Better perf/security - Low perf/security - Low utilization 25

  26. Charging for Isolation: Isolation Credits Isolation spectrum Low isolation High isolation + High utilization + Low interference + Better efficiency + Better perf/security - Low perf/security - Low utilization Isolation credit: a currency to capture this tradeoff ● Encourages tenants to only specify relevant constraints ● Encourages providers to innovate in better isolation mechanisms 26

  27. Charging for Isolation: Isolation Credits Isolation spectrum Low isolation High isolation + High utilization + Low interference + Better efficiency + Better perf/security - Low perf/security - Low utilization Stratus c r e d Cloud i constraints t b u d g e t = constraints 4 2 42 credits Tenant who Tenant with knows a budget 27

  28. Summary: Stratus Managing microarchitectural resources in a principled manner Three key ideas: 1. A declarative interface 2. A Cloud Knowledge Base (CKB) 3. Isolation Credits 28

  29. Challenges and Discussion Points Enforcing isolation Right constraints Scalability ● ● ● Mechanisms Profile guided CKB ● ● ● Policies Security libraries O(1-10ms) Is microarchitectural resource management really worth it? More efforts on mechanisms or better policies? Can we have better hardware support from vendors? What are we missing from a cloud-operation point of view? 29

Recommend


More recommend