HOP: Hardware makes Obfuscation Practical Kartik Nayak With Chris Fletcher, Ling Ren, Nishanth Chandran, Satya Lokam, Elaine Shi and Vipul Goyal 1
Compression 1 KB 1 MB Used by everyone, perhaps license it - VBB Obfuscation No one should “learn” the algorithm Another scenario: Release patches without disclosing vulnerabilities 2
Known Results Heuristic approaches to obfuscation [KKNVT’15, SK’11, ZZP’04] - Efficient - No guarantees - “Confuse” the user Impossible to achieve program obfuscation in general [BGIRSVY’01] 3
Weaker Notion of Obfuscation Indistinguishability Obfuscation ( iO ) is Achievable [BGIRSVY’01 ] Construction via multilinear maps [GGHRSW’13] - Not strong enough for practical applications - Non-standard assumptions - Inefficient point_func(x) { [AHKM’14] if x == secret return 1; else return 0; } 4
Usin ing Trusted Hardware Token Program obfuscation, Functional encryption using stateless tokens [ GISVW’10, DMMN’11, CKZ’13] - Boolean Circuits - Token functionality program dependent - Inefficient - using FHE, NIZKs - Sending many tokens 5
Work on Secure Processors Intel SGX, AEGIS [SCGDD’03] , XOM [LTMLBMH’00] : encrypts memory, verifies integrity - reveals memory access patterns - notion of obfuscation against software only adversaries Ascend [FDD’12] , GhostRider [ LHMHTS’15 ] - assume public programs; do not obfuscate programs 6
Key Contributions Boolean circuits FHE, NIZKs Efficient obfuscation of RAM programs 1 using stateless trusted hardware token Design and implement hardware system Challenges in using 2 stateless token called HOP 5x-238x better than Security under UC a baseline scheme framework 3 Scheme Optimizations 8x-76x slower than an insecure system 7
Usin ing Trusted Hardware Token Sender Receiver (malicious) (honest) Store Key Obfuscate Input Input2 Input3 Output Output3 Output2 Execute 8
Id Ideal l Functionality for Obfuscation Trusted third party output prog id (prog id, inp) Receiver Sender 9
Stateful Token Maintain state between invocations load a5, 0(s0) Oblivious add a5, a4, a5 RAM add a5, a5, a5 Authenticate memory auth oramSt Run for a fixed time T 10
A scheme with stateless tokens is is more challenging Enables context xt switching Given a scheme with stateless tokens, usin ing stateful tokens can be vie iewed as an optimization 11
Stateless Token Does not maintain state between invocations load a5, 0(s0) Oblivious add a5, a4, a5 RAM Authenticated add a5, a5, a5 PID Encryption auth oramSt PID auth oramSt 12
Stateless Token - Rewinding load a5, 0(s0) Oblivious Time 0: load a5, 0(s0) add a5, a4, a5 RAM Time 1: add a5, a4 a5 add a5, a5, a5 PID auth ’ Rewind! oramSt ’ Time 0: load a5, 0(s0) Time 1: add a5, a4 a5 13
Oblivious RAMs are generally not secure against rewinding adversaries [SCSL’11, PathORAM’13] 14
Bin inary ry-tree Paradig igm for Obli livious RAMs Path identified by leaf node l l x Memory Token State Position map l x 15
Block x Must Now Relocate! l x Memory Token State Position map l x 16
Data-access Writ ite Back New designated leaf node r Memory Update position map Token State Position map r x 17
0 1 2 3 4 5 6 7 A Rewindin ing Attack! Access Pattern: 3, , 4 Access Pattern: 3, , 3 4 4 7 T=1 4 T=0 T=0 T=1 2 1 4 3 3 Time 0: leaf 4 , reassigned … T = 0: leaf 4 , reassigned 2 Time 1: leaf 1 , reassigned … T = 1: leaf 2 , reassigned … Rewind! Rewind! Time 0: leaf 4 , reassigned … T = 0: leaf 4 , reassigned 7 Time 1: leaf 1 , reassigned … T = 1: leaf 7 , reassigned … 18
For rewinding attacks, ORAM uses PRF K (program dig igest, in input dig igest) 19
Stateless Token – Rewindin ing on in inputs Inp 1 = 20 Inp 2 = 10 Inp 3 = 40 Inp 1 = 20 Inp 2 = 10 Oblivious Inp 3 = 30 PID auth ’ RAM oramSt ’ 20
For rewinding on in inputs, adversary ry commits in input dig igest durin ing in initialization 21
Main in Theorem: In Informal Our scheme UC realizes the ideal functionality in the F token -hybrid model assuming - ORAM satisfies obliviousness - sstore adopts a semantically secure encryption scheme and a collision resistant Merkle hash tree scheme and - Assuming the security of PRFs Proof in the paper. 22
Efficient obfuscation of RAM programs 1 using stateless trusted hardware token Next: 1. Interleaving arithmetic Scheme 2 and memory instructions Optimizations 2. Using a scratchpad Design and implement hardware system 3 called HOP 23
. A N M Scheduling Optimizations to the Scheme – 1. Types of instructions – Arithmetic and Memory Naïve schedule: 1 cycle ~3000 cycles A M A M A M … Memory accesses visible to the adversary M 1170: load a5,0(a0) A + dummy memory access 1174: addi a4,sp,64 1178: addi a0,a0,4 + dummy memory access A 117c: slli a5,a5,0x2 A + dummy memory access 1180: add a5,a4,a5 A 1184: load a4,-64(a5) M 1188: addi a4,a4,1 A + dummy memory access 118c: bne a3,a0,1170 A Histogram – main loop 24
1. A N M Scheduling Optimizations to the Scheme - 1. A A A A M A A M A M A M A M A M A M A M Naïve scheduling: 12000 extra cycles What if a memory access is performed after “few” arithmetic instructions? A A A A M A A M A A A A M A A A A M (A 4 M schedule) A 4 M scheduling: 2 extra cycles 25
1. A N M Scheduling Optimizations to the Scheme - 1. Ideally, N should be program independent 𝑁𝑓𝑛𝑝𝑠𝑧 𝐵𝑑𝑑𝑓𝑡𝑡 𝑀𝑏𝑢𝑓𝑜𝑑𝑧 3000 𝑂 = 𝐵𝑠𝑗𝑢ℎ𝑛𝑓𝑢𝑗𝑑 𝐵𝑑𝑑𝑓𝑡𝑡 𝑀𝑏𝑢𝑓𝑜𝑑𝑧 = 1 A A A A M A A M 6006 cycles of actual work 2998 2996 < 6000 cycles of dummy work 26
Amount of dummy work < 50% of the total work In In other words, our scheme is is 2x 2x- competitive, i. i.e., in in the worst case, it it incurs ≤ 2x - overhead rela lative to best schedule with no dummy work 27
Optimizations to the Scheme – 2. . Usin ing a Scratchpad Program Why does a scratchpad help? void bwt-rle(char *a) { Memory accesses served bwt(a, LEN); by scratchpad rle(a, LEN); } void main() { Why not use regular hardware char *inp = readInput(); caches? for (i=0; i < len(inp); i+=LEN) spld (inp + i, LEN, 0); Cache hit/miss reveals len = bwt-rle(inp + i); information as they are } program independent 28
HOP Archit itecture 512 KB Variant of Path ORAM - Freecursive ORAM - PMMAC - 64 byte block, 1. single stage 32b - 4 GB memory integer base 2. spld 16 KB For efficiency, use stateful tokens 29
Evaluation – Speed-up over Baseli line Scheme ith A N M Scratchpad wit 3x – 238x better than baseline scheme A N M scheme only 1.5x – 18x better than baseline scheme 30
Slo lowdown Rela lative to In Insecure Schemes Slowdown to In Insecure 8x 8x-76x 76x Slowdown to GhostRider 2x 2x-41x 41x 31
Case Study: bzip ip2 bzip2: Compression algorithm Performance does not vary much based on input, so perhaps “easy” to determine running time T Two highly compressible strings String S1 String S2 106x speedup wrt baseline 234x speedup wrt baseline 17x slowdown wrt insecure 8x slowdown wrt insecure 32
Tim ime for Context xt Swit itchin ing Program State: program params < 1 KB KB ~264 KB KB Memory State: ORAM state, auth < 1 KB KB Execution State: cpustate, time Scratchpads: Instruction, Data ~528 KB KB Data stored by token: ~800 KB ire ~160 μ s to swap state Assuming 10 GB/s, , will ill requir 33
Conclu lusion We are among the first to design and implement a secure processor with a matching cryptographically sound formal abstraction (in the UC framework) Paper will be on eprint soon. Code will be open sourced. kartik@cs.umd.edu 34
Recommend
More recommend