The Ascend Secure Processor Christopher Fletcher MIT 1
Joint work with • Srini Devadas, Marten van Dijk • Ling Ren, Albert Kwon, Xiangyao Yu • Elaine Shi & Emil Stefanov • David Wentzlaff & Princeton Team (Mike, Tri, Jonathan, Alexey, Yaosheng) • Omer Khan 2
Last talk: Intel SGX Data Integrity MEE Address Timing 3
Ascend Processor This talk Data Integrity Memory controller Address Timing 4
Outline • Motivation + Oblivious RAM (ORAM) primer • ORAM in Hardware • Demo 5
If ( secret variable ) { Op Address Time R 0 1 … scan memory … W 1 10 } R 5 15 R 6 16 R 7 17 Binary search • SGX broken through page faults [Xu et al.’15] • Shared library usage [Zhuang et al.’04] • Search queries [Islam et al.’12] 6
Oblivious RAM (ORAM) [Goldreich- Ostrovsky’96] Chip pins On-chip Cache miss ORAM Shuffled Controller Provably removes all access pattern leakage 7
ORAM security definition • Access is 3 tuple: ( op = Read/write, address , data ) • Consider access sequences A and A’ A = [ (op 1 , address 1 , data 1 ), (op 2 , address 2 , data 2 ), … ] A’ = [ (op 1 ’, address 1 ’, data 1 ’), (op 2 ’, address 2 ’, data 2 ’), … ] • If |A| == |A’| then ORAM(A) ≈ ORAM(A’) 8
Path ORAM [CCS’13] “The ORAM” Chip pins ORAM Controller (on-chip) Read/writes 9
Block assigned to random path. Block lives on that path . A, 2 Off-chip B, 3 PosMap A, 2 B, 3 Path 1 2 3 4 10
Chip pins Empty space = dummy encryptions Not Encrypted Encrypted Off-chip B, 3 PosMap A, 2 A, 2 dummy B, 3 dummy dummy dummy dummy Path 1 2 3 4 11
Path ORAM Access: Read+write the path the block is assigned to. 12
Path ORAM Access: Read+write the path the block is assigned to. Off-chip B, 3 PosMap dummy A, 2 4 A, 2 dummy dummy B, 3 A, 4 B, 3 dummy dummy dummy dummy Stash Path 1 2 3 4 13
Typically, 4 slots per bucket “Z=4” Off-chip B, 3 Z=1 …for simplicity dummy dummy dummy dummy dummy Path 1 2 3 4 14
Too big! Off-chip B, 3 A, 4 A, 4 B, 3 15
Map recursion [Shi et al., 11] PosMap ORAM PosMap ORAM 2 Block On-chip On-chip Map’ Map’ Map A, 4 B, 3 Small enough Smaller 16
Map recursion [Shi et al., 11] On-chip PosMap PosMap PosMap Data ORAM ORAM 1 ORAM 2 17
Path ORAM summary Blocks assigned to paths. B Access block: Read+write path. Adversary sees: random paths. 4 2 3 1 18
ORAM in Hardware 19
First ORAM Ascend in silicon in silicon • Collaboration with David Wentzlaff’s group @ Princeton MIT Team 20
First silicon fully functional @ 500 MHz & .9 V Design (Verilog) Open Source 21
Blocks must live on assigned path or in stash . Can overflow Off-chip PosMap A, 4 B, 2 A, 4 B, 2 Stash Path 1 2 3 4 22
Blocks must live on assigned path or in stash . Off-chip Bottleneck in prior work [Maas et al. ‘13] PosMap Causes 3 X avg. slowdown on SPEC. A, 4 B, 2 A, 4 B, 2 Stash Path 1 2 3 4 23
Bit blasted stash eviction [FCCM’15] Bit vectors Path block , Path evict , occ Can be pipelined def evict( Path block , Path evict , occ ): t 1 = Path block ⊕ Path evict evict() t 2 = bit_reverse( ( (t 1 Ʌ – t 1 ) – 1) Ʌ occ ) ~ circuit ret bit_reverse( t 2 Ʌ – t 2 ) occ ’ ≈ greatest common prefix 24
Simple design, no performance bottleneck. 25
Integrity protection for ORAM ORAM Cache miss Shuffled Overlay Hash tree 26
ORAM logic Test harness SHA-3 One SHA-3 unit FPGA Prototype 27
Cheap Integrity Scheme [ASPLOS’15] • Per-block MAC { Block data , Hash(Block data , Block addr, counter ) } • Good: Hash 1 block, NOT path • Bad: Need to store counters on-chip • Replace entries in map with counters! 28
Block A’’ MAC K ( Counter || A’’ || data) Want: Path P = PosMap[A] Counter for A’’’ + 1 A’’’+1 Counter for A’’’ A ’’’ Algorithm: Given A: derive A’, A’’, A’’’ Data PosMap P’ = PRF (A’ || PosMap [A’] = Counter ) ORAM PosMap ORAM 1 P’’ = PRF (A’’ || ORAMAccess (A’’, P’)) ORAM 2 P = PRF (A’’’ || ORAMAccess (A’’’, P’’)) Data = ORAMAccess (A, P) Block A’’’ P Block P’ P’’ Counter A’ Block A’’ (A, D) On-chip PosMap (root of trust) Problem: |C| > |P| More schemes … to get |C| < |P| Integrity checks
Cheap Integrity Scheme [ASPLOS’15] Result: Hashing decreased by 68 X , simple design 30
ORAM randomizes data layout. Computer architecture assumes data locality. Subtree size = row size ISCA’13 31
# Row misses: tree height ~ tree height ~ subtree height 32
Row misses: 60% overhead 13% overhead 33
460M transistors AES rounds Stash evict() PLL ORAM .5 mm Tile 0 Tile 1 Tile 2 Tile 3 Tile 4 Encryption Stash Recursion, PLB, Tile 5 Tile 6 Tile 7 Tile 8 Tile 9 Integrity 6 mm Tile Tile Tile Tile Tile 2 mm 10 11 12 13 14 Tile Tile Tile Tile Tile Hash unit 15 16 17 18 19 Tile Tile Tile Tile Tile 20 21 22 23 24 6 mm 34
2 DRAM channels, In-order core, Slowdown vs. 2-level cache hierarchy, 1 MByte last-level cache insecure ORAM = 1208 cycles / tree lookup 11 Slowdown (X) 1 tpcc ycsb astar bzip2 gcc gob h264 libq mcf omnet perl sjeng avg 35
Demo 36
Backup 37
Recommend
More recommend