security needs a better hardware software contract
play

Security Needs a Better Hardware-Software Contract Gernot Heiser | - PowerPoint PPT Presentation

Security Needs a Better Hardware-Software Contract Gernot Heiser | gernot@unsw.edu.au | @GernotHeiser DAC19, Las Vegas, 5 June 2019 https://trustworthy.systems Threats Speculation An unknown unknown until recently A known


  1. Security Needs a Better Hardware-Software Contract Gernot Heiser | gernot@unsw.edu.au | @GernotHeiser • DAC’19, Las Vegas, 5 June 2019 https://trustworthy.systems

  2. Threats Speculation An “unknown unknown” until recently A “known unknown” for decades Microarchitectural Timing Channel 2 | DAC, Las Vegas, 5 June 2019

  3. What Are Timing Channels? 3 | DAC, Las Vegas, 5 June 2019

  4. Timing Channels Information leakage through timing of events • Typically by observing response latencies or own execution speed Covert channel: Information flow that bypasses the security policy High Low Trojan Victim Attacker Spy executes encodes observes observes normally info Side channel: Covert channel exploitable without insider help 4 | DAC, Las Vegas, 5 June 2019

  5. Cause : Competition for Shared HW Resources High Low Shared hardware Affect execution speed • Inter-process interference • Competing access to micro- architectural features • Hidden by the HW-SW contract! 5 | DAC, Las Vegas, 5 June 2019

  6. Security: A HW-SW Codesign Issue 6 | DAC, Las Vegas, 5 June 2019

  7. Enforcing Security High Low Enforce policies Operating System HW-SW Contract Hardware (CPU etc) Provide mechanisms 7 | DAC, Las Vegas, 5 June 2019

  8. Why Hardware Cannot Do Security Alone • Security policies are high-level • Course-grain: “applications” are sets of cooperating processes • Hardware mechanisms are fine-grain: instructions, pages, address spaces • Much semantics lost in mapping to hardware level • Security policies are complex: “Can A talk to B?” is too simple • maybe one-way communication is allowed • maybe communication is allowed under certain conditions • maybe low-bandwidth leakage doesn’t matter • maybe secrets only matter for a short time • maybe only subset of {confidentiality, integrity, availability} is important 8 | DAC, Las Vegas, 5 June 2019

  9. Why the ISA is an Insufficient Contract • The ISA is a purely operational contract • Sufficient for ensuring functional correctness The ISA intentionally • Insufficient for ensuring confidentiality or availability abstracts time away Affect execution speed: Availability violation High Low Observe execution speed: Confidentiality violation 9 | DAC, Las Vegas, 5 June 2019

  10. What Is Needed? 10 | DAC, Las Vegas, 5 June 2019

  11. Confidentiality Needs Time Protection High Low Time protection: A collection of OS mechanisms which collectively prevent interference between security domains that make execution Traditionally OSes enforce speed in one domain security by memory protection , dependent on the activities of i.e. enforcing spatial isolation another. [Ge et al. EuroSys’19] 11 | DAC, Las Vegas, 5 June 2019

  12. Time Protection: Partition Hardware Low Low High High Temporally partition Flush Cache Cache Need Need Spatially partition both! both! Flushing useless for Cannot spatially partition on- Low High concurrent access core caches (L1, TLB, branch predictor, pre-fetchers) • HW threads • virtually-indexed • cores • OS cannot control Cache 12 | DAC, Las Vegas, 5 June 2019

  13. Requirements for Time Protection Off-core state & stateless HW Timing channels can be closed iff the OS can • (spatially) partition or • reset all shared hardware On-core state 13 | DAC, Las Vegas, 5 June 2019

  14. Sharing 1: Stateless Interconnect H/W is bandwidth-limited High Low • Interference during concurrent access • Generally reveals no data or addresses • Must encode info into access patterns Shared • Only usable as covert channel, not interconnect side channel No effective defence Memory with present hardware! 14 | DAC, Las Vegas, 5 June 2019

  15. Sharing 2: Stateful Hardware High Low HW is capacity-limited • Interference during • concurrent access • time-shared access • Collisions reveal addresses • Usable as side channel Cache Solvable problem – focus of this work Any state-holding microarchitectural feature: • cache, branch predictor, pre-fetcher state machine 15 | DAC, Las Vegas, 5 June 2019

  16. Implementing Time Protection on Stateful Hardware 16 | DAC, Las Vegas, 5 June 2019

  17. Spatial Partitioning: Cache Colouring High Low • Partitions get frames of disjoint colours TCB PT TCB PT • seL4: userland supplies kernel memory ⇒ colouring userland colours dynamic kernel memory • Per-partition kernel image to colour kernel [Ge et al. EuroSys’19] Cache RAM 17 | DAC, Las Vegas, 5 June 2019

  18. Temporal Partitioning: Flush on Switch Must remove any Latency depends history dependence! on prior execution! 1. T 0 = current_time() 2. Switch user context 3. Flush on-core state Time padding to Remove 4. Touch all shared data needed for return dependency 5. while (T 0 +WCET < current_time()) ; 6. Reprogram timer Ensure 7. return deterministic execution 18 | DAC, Las Vegas, 5 June 2019

  19. Reality Check: Flushing On-Core State 19 | DAC, Las Vegas, 5 June 2019

  20. Evaluating Intra-Core Channels Low Low High High Flush Flush Cache Cache Mitigation on Intel and Arm processors: • Disable data prefetcher (just to be sure) • On context switch, perform all architected flush operations: • Intel: wbinvd + invpcid (no targeted L1-cache flush supported!) • Arm: DCCISW + ICIALLU + TLBIALL + BPIALL 20 | DAC, Las Vegas, 5 June 2019

  21. Methodology: Prime and Probe High Low Trojan Spy encodes observes 1. Fill cache with own data 2. Touch n cache lines Output Signal 3. Traverse cache, measure execution time Input Signal 21 | DAC, Las Vegas, 5 June 2019

  22. Methodology: Channel Matrix Probing time (cycles) 0.04 12000 0.035 datafile using 1:2:($3>pmax ? pmax : $3) 11000 0.03 Raw I-cache channel 10000 0.025 0.02 Intel Sandy Bridge 9000 0.015 8000 0.01 7000 0.005 0 0 10 20 30 40 50 60 Cache sets accessed Channel Matrix: Horizontal • Conditional probability of variation indicates observing time, t , given input, n . channel • Represented as heat map: • bright = high probability 22 | DAC, Las Vegas, 5 June 2019

  23. I-Cache Channel With Full State Flush 0.01 Time (cycles) datafile using 1:2:3 64000 63000 CHANNEL! Intel Sandy Bridge 62000 0.001 61000 60000 0 10 20 30 40 50 60 14000 Time (cycles) datafile using 1:2:3 0.01 13500 CHANNEL! Intel Haswell 13000 0.001 12500 0 2 4 6 8 10 Output (cycles) No evidence 11000 datafile using 1:2:3 0.00100 Intel Skylake 10000 9000 of channel 8000 7000 0.00010 0 10 20 30 40 50 60 Input (sets) 94000 Time (cycles) SMALL CHANNEL! datafile using 1:2:3 0.00100 HiSilicon A53 92000 90000 0.00010 0 5 10 15 20 25 30 35 40 Cache sets 23 | DAC, Las Vegas, 5 June 2019

  24. HiSilicon A53 Branch History Buffer Branch history buffer (BHB) Channel! • One-bit channel • All reset operations applied 10 -1 Spy execution time 1000 10 -2 800 10 -3 10 -4 600 10 -5 400 Trojan signal 0 1 24 | DAC, Las Vegas, 5 June 2019

  25. Intel Haswell Branch Target Buffer Branch target buffer Spy execution time • All reset operations 34000 applied Time (cycles) datafile using 1:2:3 33000 0.01 32000 0.001 31000 3500 4000 4500 5000 Trojan cache footprint Channel! Found residual channels in all recent Intel and ARM processors examined! 25 | DAC, Las Vegas, 5 June 2019

  26. Intel Spectre Defences Intel added indirect branch control (IBC) feature, which closes most channels, but… Intel Skylake Branch history buffer Small channel! https://ts.data61.csiro.au/projects/TS/timingchannels/arch-mitigation.pml 26 | DAC, Las Vegas, 5 June 2019

  27. Requirements on Hardware 27 | DAC, Las Vegas, 5 June 2019

  28. New HW/SW Contract: aISA Augmented ISA supporting time protection For all shared microarchitectural resources: 1. Resource must be spatially partitionable or flushable 2. Concurrently shared resources must be spatially partitioned 3. Resource accessed solely by virtual address must be flushed and not concurrently accessed • Implies cannot share HW threads across security domains! 4. Mechanisms must be sufficiently specified for OS to partition or reset 5. Mechanisms must be constant time, or of specified, bounded latency 6. Desirable: OS should know if resettable state is derived from data, instructions, data addresses or instruction addresses 28 | DAC, Las Vegas, 5 June 2019

  29. THANK YOU Gernot Heiser | gernot@unsw.edu.au | @GernotHeiser https://trustworthy.systems

Recommend


More recommend