The Ascend Secure Processor Christopher Fletcher MIT 1 Joint work - - PowerPoint PPT Presentation

the ascend secure processor
SMART_READER_LITE
LIVE PREVIEW

The Ascend Secure Processor Christopher Fletcher MIT 1 Joint work - - PowerPoint PPT Presentation

The Ascend Secure Processor Christopher Fletcher MIT 1 Joint work with Srini Devadas, Marten van Dijk Ling Ren, Albert Kwon, Xiangyao Yu Elaine Shi & Emil Stefanov David Wentzlaff & Princeton Team (Mike, Tri, Jonathan,


slide-1
SLIDE 1

The Ascend Secure Processor

Christopher Fletcher MIT

1

slide-2
SLIDE 2

Joint work with

  • Srini Devadas, Marten van Dijk
  • Ling Ren, Albert Kwon, Xiangyao Yu
  • Elaine Shi & Emil Stefanov
  • David Wentzlaff & Princeton Team (Mike, Tri, Jonathan, Alexey, Yaosheng)
  • Omer Khan

2

slide-3
SLIDE 3

Last talk: Intel SGX

Integrity Data Address

MEE

3

Timing

slide-4
SLIDE 4

Ascend Processor

Memory controller

Data Integrity

Address

Timing

4

This talk

slide-5
SLIDE 5

Outline

  • Motivation + Oblivious RAM (ORAM) primer
  • ORAM in Hardware
  • Demo 

5

slide-6
SLIDE 6
  • SGX broken through page faults

[Xu et al.’15]

  • Shared library usage

[Zhuang et al.’04]

  • Search queries

[Islam et al.’12]

Op Address Time

R 1 W 1 10 R 5 15 R 6 16 R 7 17

If (secret variable) { … scan memory … }

Binary search

6

slide-7
SLIDE 7

ORAM Controller

Shuffled

Cache miss Chip pins Provably removes all access pattern leakage

7

On-chip

Oblivious RAM (ORAM)

[Goldreich-Ostrovsky’96]

slide-8
SLIDE 8

ORAM security definition

  • Access is 3 tuple: (op = Read/write, address, data)
  • Consider access sequences A and A’

A = [ (op1, address1, data1), (op2, address2, data2), … ] A’ = [ (op1’, address1’, data1’), (op2’, address2’, data2’), … ]

  • If |A| == |A’|

then ORAM(A) ≈ ORAM(A’)

8

slide-9
SLIDE 9

Path ORAM [CCS’13]

ORAM Controller (on-chip) “The ORAM”

Chip pins Read/writes

9

slide-10
SLIDE 10

PosMap

Block assigned to random path. Block lives on that path.

Off-chip Path 1 2 3 4 A, 2

A, 2

B, 3

B, 3

10

slide-11
SLIDE 11

Off-chip Path 1 2 3 4 A, 2

A, 2

B, 3

B, 3

Encrypted Not Encrypted Empty space = dummy encryptions

dummy dummy dummy dummy dummy

Chip pins

11

PosMap

slide-12
SLIDE 12

Path ORAM Access: Read+write the path the block is assigned to.

12

slide-13
SLIDE 13

Off-chip Path 1 2 3 4

A, 2 B, 3

A, 2

Path ORAM Access: Read+write the path the block is assigned to.

4

B, 3 A, 4 B, 3

dummy dummy dummy dummy dummy

Stash

dummy dummy

13

PosMap

slide-14
SLIDE 14

Off-chip Path 1 2 3 4

Typically, 4 slots per bucket “Z=4”

B, 3

dummy dummy dummy dummy dummy

14

Z=1 …for simplicity

slide-15
SLIDE 15

Off-chip A, 4

Too big!

A, 4 B, 3

B, 3

15

slide-16
SLIDE 16

Map recursion [Shi et al., 11]

Map’ PosMap ORAM PosMap ORAM 2

On-chip Map

Map’ Smaller Small enough On-chip

16

Block

A, 4 B, 3

slide-17
SLIDE 17

Map recursion [Shi et al., 11]

17

Data ORAM

PosMap ORAM 1 On-chip PosMap PosMap ORAM 2

slide-18
SLIDE 18

Path ORAM summary Blocks assigned to paths. Access block: Read+write path. Adversary sees: random paths.

18

B

1 2 3 4

slide-19
SLIDE 19

ORAM in Hardware

19

slide-20
SLIDE 20

Ascend in silicon

  • Collaboration with David Wentzlaff’s group @ Princeton

20

First ORAM in silicon MIT Team

slide-21
SLIDE 21

21

First silicon fully functional @ 500 MHz & .9 V Design (Verilog) Open Source

slide-22
SLIDE 22

PosMap Off-chip Path 1 2 3 4

A, 4 B, 2

A, 4

Stash

B, 2

Blocks must live

  • n assigned path or in stash.

Can overflow

22

slide-23
SLIDE 23

PosMap Off-chip Path 1 2 3 4

A, 4 B, 2

A, 4

Stash

B, 2

Bottleneck in prior work [Maas et al. ‘13] Causes 3 X avg. slowdown on SPEC.

Blocks must live

  • n assigned path or in stash.

23

slide-24
SLIDE 24

Bit blasted stash eviction [FCCM’15]

def evict(Pathblock, Pathevict, occ): t1 = Pathblock ⊕ Pathevict t2 = bit_reverse( ( (t1 Ʌ – t1) – 1) Ʌ occ ) ret bit_reverse( t2 Ʌ – t2 )

evict() circuit

Pathblock, Pathevict, occ

  • cc’

≈ greatest common prefix

~

Can be pipelined

Bit vectors

24

slide-25
SLIDE 25

Simple design, no performance bottleneck.

25

slide-26
SLIDE 26

Integrity protection for ORAM

ORAM

Shuffled

Cache miss Overlay Hash tree

26

slide-27
SLIDE 27

One SHA-3 unit

27

ORAM logic SHA-3 Test harness

FPGA Prototype

slide-28
SLIDE 28

Cheap Integrity Scheme [ASPLOS’15]

  • Per-block MAC
  • Good: Hash 1 block, NOT path
  • Bad: Need to store counters on-chip
  • Replace entries in map with counters!

28

{Block data, Hash(Block data, Block addr, counter)}

slide-29
SLIDE 29

P’ P P’’

Block (A, D) Block A’’ Block A’’’

A’

PosMap ORAM 2 PosMap ORAM 1 Data ORAM On-chip PosMap (root of trust)

Counter

Want: Path P = PosMap[A] Algorithm: Given A: derive A’, A’’, A’’’ P’ = PRF(A’ || PosMap[A’] = Counter) P’’ = PRF(A’’ || ORAMAccess(A’’, P’)) P = PRF(A’’’ || ORAMAccess(A’’’, P’’)) Data = ORAMAccess(A, P)

Block A’’

Counter for A’’’ + 1 Counter for A’’’

A’’’

A’’’+1 MACK(Counter|| A’’ || data)

Integrity checks

Problem: |C| > |P| More schemes to get |C| < |P|

slide-30
SLIDE 30

Cheap Integrity Scheme [ASPLOS’15]

Result: Hashing decreased by 68 X, simple design

30

slide-31
SLIDE 31

ORAM randomizes data layout. Computer architecture assumes data locality.

ISCA’13

Subtree size = row size

31

slide-32
SLIDE 32

# Row misses: tree height subtree height

32

~ ~

tree height

slide-33
SLIDE 33

Row misses: 60% overhead 13% overhead

33

slide-34
SLIDE 34

Encryption Stash Recursion, PLB, Integrity

2 mm .5 mm

Tile 0 Tile 1 Tile 2 Tile 3 Tile 4 Tile 5 Tile 6 Tile 7 Tile 8 Tile 9 Tile 10 Tile 11 Tile 12 Tile 13 Tile 14 Tile 15 Tile 16 Tile 17 Tile 18 Tile 19 Tile 20 Tile 21 Tile 22 Tile 23 Tile 24 ORAM PLL

6 mm 6 mm

34

460M transistors

AES rounds Stash evict() Hash unit

slide-35
SLIDE 35

1 11

tpcc ycsb astar bzip2 gcc gob h264 libq mcf

  • mnet

perl sjeng avg

Slowdown (X)

2 DRAM channels, In-order core, 2-level cache hierarchy, 1 MByte last-level cache ORAM = 1208 cycles / tree lookup

35

Slowdown vs. insecure

slide-36
SLIDE 36

Demo

36

slide-37
SLIDE 37

Backup

37