the ascend secure processor
play

The Ascend Secure Processor Christopher Fletcher MIT 1 Joint work - PowerPoint PPT Presentation

The Ascend Secure Processor Christopher Fletcher MIT 1 Joint work with Srini Devadas, Marten van Dijk Ling Ren, Albert Kwon, Xiangyao Yu Elaine Shi & Emil Stefanov David Wentzlaff & Princeton Team (Mike, Tri, Jonathan,


  1. The Ascend Secure Processor Christopher Fletcher MIT 1

  2. Joint work with • Srini Devadas, Marten van Dijk • Ling Ren, Albert Kwon, Xiangyao Yu • Elaine Shi & Emil Stefanov • David Wentzlaff & Princeton Team (Mike, Tri, Jonathan, Alexey, Yaosheng) • Omer Khan 2

  3. Last talk: Intel SGX Data Integrity MEE Address Timing 3

  4. Ascend Processor This talk Data Integrity Memory controller Address Timing 4

  5. Outline • Motivation + Oblivious RAM (ORAM) primer • ORAM in Hardware • Demo  5

  6. If ( secret variable ) { Op Address Time R 0 1 … scan memory … W 1 10 } R 5 15 R 6 16 R 7 17 Binary search • SGX broken through page faults [Xu et al.’15] • Shared library usage [Zhuang et al.’04] • Search queries [Islam et al.’12] 6

  7. Oblivious RAM (ORAM) [Goldreich- Ostrovsky’96] Chip pins On-chip Cache miss ORAM Shuffled Controller Provably removes all access pattern leakage 7

  8. ORAM security definition • Access is 3 tuple: ( op = Read/write, address , data ) • Consider access sequences A and A’ A = [ (op 1 , address 1 , data 1 ), (op 2 , address 2 , data 2 ), … ] A’ = [ (op 1 ’, address 1 ’, data 1 ’), (op 2 ’, address 2 ’, data 2 ’), … ] • If |A| == |A’| then ORAM(A) ≈ ORAM(A’) 8

  9. Path ORAM [CCS’13] “The ORAM” Chip pins ORAM Controller (on-chip) Read/writes 9

  10. Block assigned to random path. Block lives on that path . A, 2 Off-chip B, 3 PosMap A, 2 B, 3 Path 1 2 3 4 10

  11. Chip pins Empty space = dummy encryptions Not Encrypted Encrypted Off-chip B, 3 PosMap A, 2 A, 2 dummy B, 3 dummy dummy dummy dummy Path 1 2 3 4 11

  12. Path ORAM Access: Read+write the path the block is assigned to. 12

  13. Path ORAM Access: Read+write the path the block is assigned to. Off-chip B, 3 PosMap dummy A, 2 4 A, 2 dummy dummy B, 3 A, 4 B, 3 dummy dummy dummy dummy Stash Path 1 2 3 4 13

  14. Typically, 4 slots per bucket “Z=4” Off-chip B, 3 Z=1 …for simplicity dummy dummy dummy dummy dummy Path 1 2 3 4 14

  15. Too big! Off-chip B, 3 A, 4 A, 4 B, 3 15

  16. Map recursion [Shi et al., 11] PosMap ORAM PosMap ORAM 2 Block On-chip On-chip Map’ Map’ Map A, 4 B, 3 Small enough Smaller 16

  17. Map recursion [Shi et al., 11] On-chip PosMap PosMap PosMap Data ORAM ORAM 1 ORAM 2 17

  18. Path ORAM summary Blocks assigned to paths. B Access block: Read+write path. Adversary sees: random paths. 4 2 3 1 18

  19. ORAM in Hardware 19

  20. First ORAM Ascend in silicon in silicon • Collaboration with David Wentzlaff’s group @ Princeton MIT Team 20

  21. First silicon fully functional @ 500 MHz & .9 V Design (Verilog) Open Source 21

  22. Blocks must live on assigned path or in stash . Can overflow Off-chip PosMap A, 4 B, 2 A, 4 B, 2 Stash Path 1 2 3 4 22

  23. Blocks must live on assigned path or in stash . Off-chip Bottleneck in prior work [Maas et al. ‘13] PosMap Causes 3 X avg. slowdown on SPEC. A, 4 B, 2 A, 4 B, 2 Stash Path 1 2 3 4 23

  24. Bit blasted stash eviction [FCCM’15] Bit vectors Path block , Path evict , occ Can be pipelined def evict( Path block , Path evict , occ ): t 1 = Path block ⊕ Path evict evict() t 2 = bit_reverse( ( (t 1 Ʌ – t 1 ) – 1) Ʌ occ ) ~ circuit ret bit_reverse( t 2 Ʌ – t 2 ) occ ’ ≈ greatest common prefix 24

  25. Simple design, no performance bottleneck. 25

  26. Integrity protection for ORAM ORAM Cache miss Shuffled Overlay Hash tree 26

  27. ORAM logic Test harness SHA-3 One SHA-3 unit FPGA Prototype 27

  28. Cheap Integrity Scheme [ASPLOS’15] • Per-block MAC { Block data , Hash(Block data , Block addr, counter ) } • Good: Hash 1 block, NOT path • Bad: Need to store counters on-chip • Replace entries in map with counters! 28

  29. Block A’’ MAC K ( Counter || A’’ || data) Want: Path P = PosMap[A] Counter for A’’’ + 1 A’’’+1 Counter for A’’’ A ’’’ Algorithm: Given A: derive A’, A’’, A’’’ Data PosMap P’ = PRF (A’ || PosMap [A’] = Counter ) ORAM PosMap ORAM 1 P’’ = PRF (A’’ || ORAMAccess (A’’, P’)) ORAM 2 P = PRF (A’’’ || ORAMAccess (A’’’, P’’)) Data = ORAMAccess (A, P) Block A’’’ P Block P’ P’’ Counter A’ Block A’’ (A, D) On-chip PosMap (root of trust) Problem: |C| > |P| More schemes … to get |C| < |P| Integrity checks

  30. Cheap Integrity Scheme [ASPLOS’15] Result: Hashing decreased by 68 X , simple design 30

  31. ORAM randomizes data layout. Computer architecture assumes data locality. Subtree size = row size ISCA’13 31

  32. # Row misses: tree height ~ tree height ~ subtree height 32

  33. Row misses: 60% overhead 13% overhead 33

  34. 460M transistors AES rounds Stash evict() PLL ORAM .5 mm Tile 0 Tile 1 Tile 2 Tile 3 Tile 4 Encryption Stash Recursion, PLB, Tile 5 Tile 6 Tile 7 Tile 8 Tile 9 Integrity 6 mm Tile Tile Tile Tile Tile 2 mm 10 11 12 13 14 Tile Tile Tile Tile Tile Hash unit 15 16 17 18 19 Tile Tile Tile Tile Tile 20 21 22 23 24 6 mm 34

  35. 2 DRAM channels, In-order core, Slowdown vs. 2-level cache hierarchy, 1 MByte last-level cache insecure ORAM = 1208 cycles / tree lookup 11 Slowdown (X) 1 tpcc ycsb astar bzip2 gcc gob h264 libq mcf omnet perl sjeng avg 35

  36. Demo 36

  37. Backup 37

Recommend


More recommend