secure computation of mips machine code
play

Secure Computation of MIPS Machine Code Gordon, Katz, McIntosh, - PowerPoint PPT Presentation

Secure Computation of MIPS Machine Code Gordon, Katz, McIntosh, Wang Efficiency vs. Generality generality efficiency Domain specific languages that approximate certain high level languages. Constructions tailored towards particular


  1. Secure Computation of MIPS Machine Code Gordon, Katz, McIntosh, Wang

  2. Efficiency vs. Generality generality efficiency Domain specific languages that approximate certain high level languages. Constructions tailored towards particular applications. Machine code / legacy code 2

  3. Legacy Code Moving to the RAM model offers the possibility of securely emulating real architectures. In theory, we can support “real” languages, their existing libraries, and existing compilers. What would this take in practice? Ideal world: the programmer has never heard the words “secure computation”.

  4. Oblivious RAM [GO96,…] (v 1 , d 1 ), (v 2 , d 2 ) …. , (v n , d n ) client server (r, v 5 ), (r, v 2 ), (w, v 2 ,d 1 ) …. , (w, v 7 , d 2 ) access pattern 1: access pattern 2: (r, v 1 ), (r, v 1 ), (r, v 1 ) …. , (r, v 1 ) ANY 2 access patterns are indistinguishable 4

  5. ORAM in secure computation Who should hold the ORAM? Recall, the client fetches items from the server. Alice shouldn’t see Bob’s items, and Bob Shouldn’t see Alice’s. x F(x,y) y ORAM

  6. ORAM in Secure Computation But even if Alice sees which of her own items are fetched she learns something about y. (Consider a binary search for y among the items in X) x F(x,y) y ORAM ORAM

  7. Oblivious RAM (abstraction) V V state (0) v (1) ORAM state (1) state (1) ORAM v (2) ORAM (READ) state (2) … state (logn-1) v (logn) ORAM state (logn) v (1) , … ,v (logn) D 1 of the log n will match, output D

  8. CCS 2012 v 1 v 2 “secret shares” (0) state 1 ⊕ v 1 v 2 = v (0) state 2 ⊕ v (1) Y state 1 state 2 = state ORAM A (1) state 1 O (1) state 2 (1) state 1 (1) state 2 Y v (2) A ORAM (2) state 1 O (2) state 2 … (log n-1) state 1 (log n-1) state 2 Y v (logn) A ORAM (logn) state 1 O (logn) state 2 D 1 D 2

  9. CCS 2012 (0) (0) , inst 1 ) (state 1 inst 1 (0) inst 2 (0) , inst 2 ) (state 2 (0) state 1 (0) state 2 v (1) Y ORAM A (1) state 1 O RAM (1) state 2 Y (1) state 1 A (1) state 2 PROGRAM O Y v (2) A ORAM (BINARY SEARCH) (2) state 1 O (2) state 2 … (log n-1) state 1 (log n-1) state 2 (log n) , D 1 ) (state 1 Y v (logn) (log n) , D 2 ) (state 2 A ORAM (logn) state 1 O (logn) state 2 D 1 D 2

  10. Current Work (DARPA: PROCEED) new 32-registers progCounter new progCounter ObliVM INSTRUCTION Y Y FETCH CPU A A O O MIPS ARCHITECTURE YAO new instruction 32-registers LOAD/STORE new 32-registers WORD instruction progCounter

  11. Current Work new 32-registers progCounter new progCounter INSTRUCTION Y Y FETCH CPU A A O O MIPS ARCHITECTURE YAO new instruction 32-registers LOAD/STORE new 32-registers WORD instruction progCounter

  12. Current Work new 32-registers Why MIPS? new progCounter • Fixed register space = fixed circuit. • With approximately 15 instructions, we can compute: Djikstra, longest common sub-string, set- intersection, stable marriage, binary search, decision trees… Y CPU • Easy to implement! A O MIPS ARCHITECTURE • 15 instructions = small circuit. • We first proposed LLVM, but instructions in LLVM are polymorphic objects. On the other hand: • ARM or x86 would give bigger circuits, new 32-registers instruction but smaller programs. Ultimately, I don’t progCounter know which is best.

  13. Current Work new 32-registers progCounter new progCounter INSTRUCTION Y Y FETCH CPU A A O O MIPS ARCHITECTURE YAO new instruction 32-registers LOAD/STORE new 32-registers WORD instruction progCounter

  14. Component Run-Times ALU  15 instructions  7K AND gates Memory Fetch from 1024 32-bit words  43K AND gates

  15. Improvement #1: Instruction Mapping Divide all instructions into separate “banks” Bank i contains instructions that could be executed in the i th cycle.

  16. Instruction Mapping If (x > 5) If x is tainted: instr1 and instr3 must go instr1 in the same ORAM bank instr2 else and instr3 instr2 and instr4 must go in the same ORAM bank instr4 for (i = 1 to x) t = 1: instr1 instr1 t = 2: instr2 instr2 t = 3: instr1 or instr3 end for t = 4: instr2 or instr4 instr3 t = 5: instr1 or instr3 or intsr5 instr4 loop size t, program length n: instr5 n/t banks, each of size t.

  17. Instruction Mapping Wade through fewer instructions! INSTRUCTION Y Y FETCH CPU A A O O MIPS ARCHITECTURE Reduce the YAO number of instructions! skip this on MANY steps! LOAD/STORE WORD

  18. Instruction Mapping Y CPU A O MIPS ARCHITECTURE Set Intersection: Reduces the average ALU size from Reduce the 6727 to 1848 AND gates. (3.5X) number of instructions!

  19. Instruction Mapping Wade through fewer instructions! INSTRUCTION Y FETCH A O Set Intersection: The full program has about 150 instructions. The largest instruction bank after mapping has 31 instructions. More than half the instruction banks have fewer than 20 instructions.

  20. Instruction Mapping Unfortunately, even after instruction mapping, load/store operations still might occur in almost every time step. YAO skip this on MOST steps!  LOAD/STORE WORD

  21. Improvement #2: padding for (i = 1 to x) If two branches are relatively prime, one of length k 1 , the other If (x > 5) k 2 , then in less than k 1 k 2 time-steps, instr1 we will cover the entire loop. instr2 else instr3 By padding branches such that the lengths are relatively composite, we instr4 can greatly reduce the number of instr5 instructions per bank: for set intersection, we go from  40 down to 4.

  22. Padding We padded 2 of 3 branches that appear in the main loop using a total of 6 NOP instructions. Before padding we found that a load/store operation might be executed in almost every time step. After padding, we find that for only 1/10 of all time steps require a load/store operation. YAO skip this on MOST steps!  LOAD/STORE WORD

  23. Set Intersection Run-time decomposition for computing set- Run-time decomposition for computing set- intersection size when each party's input intersection size when each party's input consists of 64 32-bit integers. consists of 1024 32-bit integers.

  24. Set Intersection

  25. Binary Search Comparing the performance of secure binary search. One party holds an array of 32-bit integers, while the other holds a value to search for.

  26. Decision Trees

  27. A True Universal Circuit One more benefit of the general approach: We have a true universal circuit! 1. Compile the private input function to MIPS, 2. Supply a function pointer as input to the emulator. 3. Our optimizations no longer apply: the analysis leaks information.

  28. Thanks!

Recommend


More recommend