memory hard functions and tradeoff cryptanalysis with
play

Memory-hard functions and tradeoff cryptanalysis with applications - PowerPoint PPT Presentation

Memory-hard functions and tradeoff cryptanalysis with applications to password hashing, cryptocurrencies, and white-box cryptography Alex Biryukov Dmitry Khovratovich Johann Groschaedl University of Luxembourg 1 Introduction Passwords


  1. Pebble game Computation with space complexity S can be modelled as a pebble game with S pebbles: • A free pebble can be placed on an input vertex at any time; • A pebble can be removed at any time; • A free pebble can be placed at any vertex if all its predecessors are pebbled. • We win if we pebble all output vertices. In 5 Out

  2. Pebble game Computation with space complexity S can be modelled as a pebble game with S pebbles: • A free pebble can be placed on an input vertex at any time; • A pebble can be removed at any time; • A free pebble can be placed at any vertex if all its predecessors are pebbled. • We win if we pebble all output vertices. In 4 Out

  3. Pebble game Computation with space complexity S can be modelled as a pebble game with S pebbles: • A free pebble can be placed on an input vertex at any time; • A pebble can be removed at any time; • A free pebble can be placed at any vertex if all its predecessors are pebbled. • We win if we pebble all output vertices. In 3 Out

  4. Pebble game Computation with space complexity S can be modelled as a pebble game with S pebbles: • A free pebble can be placed on an input vertex at any time; • A pebble can be removed at any time; • A free pebble can be placed at any vertex if all its predecessors are pebbled. • We win if we pebble all output vertices. In 3 Out

  5. Pebble game Computation with space complexity S can be modelled as a pebble game with S pebbles: • A free pebble can be placed on an input vertex at any time; • A pebble can be removed at any time; • A free pebble can be placed at any vertex if all its predecessors are pebbled. • We win if we pebble all output vertices. In 5 Out

  6. Pebble game Computation with space complexity S can be modelled as a pebble game with S pebbles: • A free pebble can be placed on an input vertex at any time; • A pebble can be removed at any time; • A free pebble can be placed at any vertex if all its predecessors are pebbled. • We win if we pebble all output vertices. In 4 Out

  7. Pebble game Computation with space complexity S can be modelled as a pebble game with S pebbles: • A free pebble can be placed on an input vertex at any time; • A pebble can be removed at any time; • A free pebble can be placed at any vertex if all its predecessors are pebbled. • We win if we pebble all output vertices. In 3 Out

  8. Pebble game Computation with space complexity S can be modelled as a pebble game with S pebbles: • A free pebble can be placed on an input vertex at any time; • A pebble can be removed at any time; • A free pebble can be placed at any vertex if all its predecessors are pebbled. • We win if we pebble all output vertices. In 2 Out

  9. Pebble game Computation with space complexity S can be modelled as a pebble game with S pebbles: • A free pebble can be placed on an input vertex at any time; • A pebble can be removed at any time; • A free pebble can be placed at any vertex if all its predecessors are pebbled. • We win if we pebble all output vertices. In 2 Out

  10. Pebble game Computation with space complexity S can be modelled as a pebble game with S pebbles: • A free pebble can be placed on an input vertex at any time; • A pebble can be removed at any time; • A free pebble can be placed at any vertex if all its predecessors are pebbled. • We win if we pebble all output vertices. In 4 Out

  11. Pebble game Computation with space complexity S can be modelled as a pebble game with S pebbles: • A free pebble can be placed on an input vertex at any time; • A pebble can be removed at any time; • A free pebble can be placed at any vertex if all its predecessors are pebbled. • We win if we pebble all output vertices. In 3 Out

  12. Pebble game Computation with space complexity S can be modelled as a pebble game with S pebbles: • A free pebble can be placed on an input vertex at any time; • A pebble can be removed at any time; • A free pebble can be placed at any vertex if all its predecessors are pebbled. • We win if we pebble all output vertices. In 2 Out

  13. Pebble game Computation with space complexity S can be modelled as a pebble game with S pebbles: • A free pebble can be placed on an input vertex at any time; • A pebble can be removed at any time; • A free pebble can be placed at any vertex if all its predecessors are pebbled. • We win if we pebble all output vertices. In 1 Out

  14. Earlier results Early results on pebble games on k -in graphs: N • Every graph with N vertices can be pebbled with c k log N pebbles 3 ; 3 John E. Hopcroft, Wolfgang J. Paul, and Leslie G. Valiant. “On Time Versus Space”. In: J. ACM 24.2 (1977), pp. 332–337. 4 Wolfgang J. Paul, Robert Endre Tarjan, and James R. Celoni. “Space Bounds for a Game on Graphs”. In: Mathematical Systems Theory 10 (1977), pp. 239–251. 5 Thomas Lengauer and Robert Endre Tarjan. “Asymptotically tight bounds on time-space trade-offs in a pebble game”. In: J. ACM 29.4 (1982), pp. 1087–1130.

  15. Earlier results Early results on pebble games on k -in graphs: N • Every graph with N vertices can be pebbled with c k log N pebbles 3 ; • There exist graphs for which this bound is tight 4 and time complexity is superpolynomial of N 5 . N Time complexity bounds for pebble number between log N and N is unclear. 3 John E. Hopcroft, Wolfgang J. Paul, and Leslie G. Valiant. “On Time Versus Space”. In: J. ACM 24.2 (1977), pp. 332–337. 4 Wolfgang J. Paul, Robert Endre Tarjan, and James R. Celoni. “Space Bounds for a Game on Graphs”. In: Mathematical Systems Theory 10 (1977), pp. 239–251. 5 Thomas Lengauer and Robert Endre Tarjan. “Asymptotically tight bounds on time-space trade-offs in a pebble game”. In: J. ACM 29.4 (1982), pp. 1087–1130.

  16. Parallel pebbling Nowadays, multiple cores are available. Hence we can precompute: X F F F F F F F F F Y F F F F F F F F The scheme latency is still 2 N hash calls, since the input addresses are known in advance.

  17. Memory-hardness from superconcentrators Superconcentrators: several layers, each set of l input and l output vertices has l vertex-disjoint paths. Stacks of superconcentrators exhibit nice tradeoffs 6 : T = α O ( α ) N . Stacks of superconcentrators are interesting candidates for a memory-hard function, but the overhead is too large (40+ layers for 1 GB of RAM). 6 Thomas Lengauer and Robert Endre Tarjan. “Asymptotically tight bounds on time-space trade-offs in a pebble game”. In: J. ACM 29.4 (1982), pp. 1087–1130.

  18. Resilience to parallel hashing Another problem of superconcentrators: if N cores are available, the time complexity is only log n . 7 Jo¨ el Alwen and Vladimir Serbinenko. “High Parallel Complexity Graphs and Memory-Hard Functions”. In: IACR Cryptology ePrint Archive, Report 2014/238 ().

  19. Resilience to parallel hashing Another problem of superconcentrators: if N cores are available, the time complexity is only log n . There are graphs of size N that can not be efficiently parallelized 7 . However, they consist of log 5 N layers, which is prohibitively slow with memory size of 100 MB. 7 Jo¨ el Alwen and Vladimir Serbinenko. “High Parallel Complexity Graphs and Memory-Hard Functions”. In: IACR Cryptology ePrint Archive, Report 2014/238 ().

  20. Data-dependent addressing: Scrypt

  21. Scrypt Scrypt 8 — hashing with data-dependent addressing: • Sequential initialization: X [ i ] ← H ( X [ i − 1 ]) ; • Pseudo-random walk on X [] (previously suggested by Dwork et al. 9 ): for 1 ≤ i ≤ N A ← H ( A ⊕ X [ A ]) . • Used in the Litecoin cryptocurrency with moderate N ; X [ A ] X [] : A H 8 Colin Percival. “Stronger key derivation via sequential memory-hard functions”. In: (2009). http://www.tarsnap.com/scrypt/scrypt.pdf . 9 Cynthia Dwork, Moni Naor, and Hoeteck Wee. “Pebbling and Proofs of Work”. In: CRYPTO 2005 .

  22. Scrypt Problems: • Too many parameters and subfunctions; • Allows trivial tradeoff: ST = O ( N 2 ); • ASIC implementations demonstrate 1000x efficiency improvement; • Might be subject to cache-based timing attacks 10 . 10 Daniel J. Bernstein. Cache-timing attacks on AES . Tech. rep. http://cr.yp.to/antiforgery/cachetiming-20050414.pdf . 2005.

  23. Open problems: 1 What are the most efficient memory-hard functions? 2 Do they have to be data-dependent? 3 What are the best tradeoffs we can get?

  24. PHC Password Hashing Competition (2014-2015): struggle to find faster, more secure, more universal schemes. • 22 schemes in competition; • Vast majority claim resilience to GPU/ASIC cracking; • Only a few really tried to attack their schemes (standard practice in cryptography designs);

  25. PHC Password Hashing Competition (2014-2015): struggle to find faster, more secure, more universal schemes. • 22 schemes in competition; • Vast majority claim resilience to GPU/ASIC cracking; • Only a few really tried to attack their schemes (standard practice in cryptography designs); • We show how to improve such attacks; • And we will see how ASIC-equipped adversaries can exploit them. We considered three schemes, which have come out of academic crypto-community and have clear documentation.

  26. Catena

  27. Bit-reversal permutation Bit-reversal permutation 11 : • two layers, vertex i 1 i 2 · · · i n is connected with i n · · · i 2 i 1 . 000 001 010 011 100 101 110 111 In 001 000 010 011 100 101 110 111 Out Tradeoff: ST = O ( N 2 ); or T = α N , where α = N S — memory reduction. 11 Thomas Lengauer and Robert Endre Tarjan. “Asymptotically tight bounds on time-space trade-offs in a pebble game”. In: J. ACM 29.4 (1982), pp. 1087–1130. 12 Jo¨ el Alwen and Vladimir Serbinenko. “High Parallel Complexity Graphs and Memory-Hard Functions”. In: IACR Cryptology ePrint Archive, Report 2014/238 ().

  28. Bit-reversal permutation Bit-reversal permutation 11 : • two layers, vertex i 1 i 2 · · · i n is connected with i n · · · i 2 i 1 . 000 001 010 011 100 101 110 111 In 001 000 010 011 100 101 110 111 Out Tradeoff: ST = O ( N 2 ); or T = α N , where α = N S — memory reduction. √ Can be computed with N memory and time 2 N on multiple cores 12 . • Stack of such permutations? 11 Thomas Lengauer and Robert Endre Tarjan. “Asymptotically tight bounds on time-space trade-offs in a pebble game”. In: J. ACM 29.4 (1982), pp. 1087–1130. 12 Jo¨ el Alwen and Vladimir Serbinenko. “High Parallel Complexity Graphs and Memory-Hard Functions”. In: IACR Cryptology ePrint Archive, Report 2014/238 ().

  29. Catena- λ Catena 13 : • Stack of λ bit-reversal permutations ( λ = 3 , 4): V L [ ABC ] = H ( V L [ ABC − 1 ] , V L − 1 [ C B A ]) . 000 001 010 011 100 101 110 111 In 001 000 010 011 100 101 110 111 001 000 010 011 100 101 110 111 Out 13 Christian Forler, Stefan Lucks, and Jakob Wenzel. “Catena: A Memory-Consuming Password Scrambler”. In: IACR Cryptology ePrint Archive, Report 2013/525 ().

  30. Catena- λ Catena 13 : • Stack of λ bit-reversal permutations ( λ = 3 , 4): V L [ ABC ] = H ( V L [ ABC − 1 ] , V L − 1 [ C B A ]) . 000 001 010 011 100 101 110 111 In 001 000 010 011 100 101 110 111 001 000 010 011 100 101 110 111 Out • Full-round hash function (Blake2); • Proof of tradeoff resilience (extension of Lengauer-Tarjan proof for λ = 1): S λ T = Ω( N λ + 1 ) Memory fraction 1 q should imply penalty q λ . 13 Christian Forler, Stefan Lucks, and Jakob Wenzel. “Catena: A Memory-Consuming Password Scrambler”. In: IACR Cryptology ePrint Archive, Report 2013/525 ().

  31. Observation Apparently, the proof has a flaw.

  32. Observation Apparently, the proof has a flaw. 0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111 0010 0100 0111 1011 1101 0000 0001 1110 0011 1100 1111 0101 0110 1010 1000 1001 0010 0100 0111 1011 1101 0000 0001 1110 0011 1100 1111 0101 0110 1010 1000 1001 0010 0100 0111 1011 1101 0000 0001 1110 0011 1100 1111 0101 0110 1010 1000 1001 0010 0100 0111 1011 1101 0000 0001 1110 0011 1100 1111 0101 0110 1010 1000 1001 • Consider vertices [ AB 0 ] , [ AB 1 ] , [ AB 2 ] , . . . , where B has n − 2 k bits and the other letters are k -bit;

  33. Observation Apparently, the proof has a flaw. 0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111 0010 0100 0111 1011 1101 0000 0001 1110 0011 1100 1111 0101 0110 1010 1000 1001 0010 0100 0111 1011 1101 0000 0001 1110 0011 1100 1111 0101 0110 1010 1000 1001 0010 0100 0111 1011 1101 0000 0001 1110 0011 1100 1111 0101 0110 1010 1000 1001 0010 0100 0111 1011 1101 0000 0001 1110 0011 1100 1111 0101 0110 1010 1000 1001 • Consider vertices [ AB 0 ] , [ AB 1 ] , [ AB 2 ] , . . . , where B has n − 2 k bits and the other letters are k -bit; • To compute [ ABC ] at level T , we need [ C B A ] at level T − 1; • [ C B A ] refers to [ ABC ] at level T − 2. • Note that the middle part is either B or B .

  34. Tradeoff cryptanalysis 0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111 0010 0100 0111 1011 1101 0000 0001 1110 0011 1100 1111 0101 0110 1010 1000 1001 0010 0100 0111 1011 1101 0000 0001 1110 0011 1100 1111 0101 0110 1010 1000 1001 0010 0100 0111 1011 1101 0000 0001 1110 0011 1100 1111 0101 0110 1010 1000 1001 0010 0100 0111 1011 1101 0000 0001 1110 0011 1100 1111 0101 0110 1010 1000 1001 Efficient computation of [ AB ∗ ] at level 4: • Suppose that we have stored all vertices [ ∗ ∗ 0 ] at all levels (2 n − k vertices per level); • Compute [ ∗ B ∗ ] at level 0 (2 2 k steps); • Use these values to compute [ ∗ B ∗ ] at level 1 (2 2 k steps); • Use these values to compute [ ∗ B ∗ ] at level 2 (2 2 k steps); • Use these values to compute [ ∗ B A ] at level 3 ( A 2 k steps); • Use these values to compute [ AB ∗ ] at level 4 (2 k steps). In total 3 . 5 · 2 2 k hashes for 2 k vertices.

  35. Cryptanalysis-II Eventually we have the following penalties for l < n / 3 − 2: Memory fraction Catena-3 Catena-4 Penalty 1 7 . 4 13 . 1 2 1 15 . 5 26 . 1 4 1 30 . 1 51 . 5 8 1 2 l + 1 . 9 2 l + 2 . 7 2 l

  36. Cryptanalysis-II Eventually we have the following penalties for l < n / 3 − 2: Memory fraction Catena-3 Catena-4 Penalty 1 7 . 4 13 . 1 2 1 15 . 5 26 . 1 4 1 30 . 1 51 . 5 8 1 2 l + 1 . 9 2 l + 2 . 7 2 l So the penalty is 4 q for memory fraction 1 q . Tradeoff for Catena-3: ST ≤ 16 N 2 ;

  37. Argon

  38. Argon Argon 14 : Mix Mix Mix Mix Mix Mix Mix Mix Mix Mix Mix Mix Blockcipher-based design: • n × 32-matrix of 16-byte blocks; • Row-wise nonlinear transformation (48 reduced AES cores and a linear layer) with guaranteed branch number (at least 8 inputs for 1 output); • Column-wise permutation ( n data-dependent swaps based on the RC4 permutation). 14 Alex Biryukov and Dmitry Khovratovich. Argon: password hashing scheme . Tech. rep. https://www.cryptolux.org/images/0/0c/Argon-v1.pdf . 2014.

  39. Tradeoff Mix Mix Mix Mix Mix Mix Mix Mix Mix Mix Mix Mix When trying to attack apply the following strategy: • Store permutations, not blocks (about 1 2 of total memory);

  40. Tradeoff Mix Mix Mix Mix Mix Mix Mix Mix Mix Mix Mix Mix When trying to attack apply the following strategy: • Store permutations, not blocks (about 1 2 of total memory); • When an element is needed, recompute it;

  41. Tradeoff Mix Mix Mix Mix Mix Mix Mix Mix Mix Mix Mix Mix When trying to attack apply the following strategy: • Store permutations, not blocks (about 1 2 of total memory); • When an element is needed, recompute it; • Parallelize the RC4 permutation: ≈ 250 elements can be read in parallel without bank collisions. 512 AES calls 128 AES calls 64 lookups 16 AES calls 8 lookups 2 AES calls 1 lookup Mix Mix Mix Mix Mix Mix Mix Mix Mix Mix Mix Mix Last level: one memory access is replaced with a tree of depth 7 of 5-round AES, which increases latency by a few times.

  42. Further tradeoff 512 AES calls 128 AES calls 64 lookups 16 AES calls 8 lookups 2 AES calls 1 lookup Mix Mix Mix Mix Mix Mix Mix Mix Mix Mix Mix Mix If the last permutation can not be saved, it has to be recomputed each time we need an element: 2 18 -increase in latency.

  43. Computational penalties Penalties slightly depend on the memory size: Fraction \ Memory 16 MB 128 MB 1 GB 1 139 160 180 2 1 2 18 2 26 2 34 4 1 2 31 2 36 2 47 8 Tradeoff: N 4 T = ( cN ) 3 N / M

  44. Lyra2

  45. Lyra2 Lyra2 [Simplicio-Almeida-Andrade-dos Santos-Barreto’13]: 64 1 1 2 R R × 64 matrix of 64-byte blocks. Two phases: • Setup phase: deterministic generation and update of rows; • Wandering phase ( T ≥ 1 iterations): sequential and pseudorandom update in parallel. Claims high speed (1.2GB/sec for T = 1).

  46. Setup phase F — stateful function with 128-byte state ( sponge construction based on hash function Blake2b). M [ i ] ← F ( M [ i − 1 ] , M [ 2 k − i ]) , M [ 2 k − i ] ⊕ = M [ i ]; M [5] M [6] M [7] M [8] M [9] F M [14] M [11] M [12]

  47. Setup phase Overall picture: 0 R

  48. Tradeoff analysis: Setup phase Strategy: • Store first 2 l rows; • Store every q -th row; Then q consecutive rows are determined from q ( r − l ) previous rows, which are precomputed. 2 l 2 r − i q 2 r − 1 i Setup phase can be computed with little penalty and memory.

  49. Wandering phase Wandering phase: M [ i ] ← F ( M [ i − 1 ] , M [ r i ] , M [ r i ] ⊕ = M [ i ]; Here r i – pseudorandom function of M [ i − 1 ] (i.e. determined at the time of computation). blocks r [ i ] i time i − 1

  50. Tradeoff analysis: Wandering phase Pseudo-random dependency seem to impose prohibitive penalties: blocks i r [ i ] i − 1 time Trees may cover the entire matrix.

  51. Tradeoff analysis: Wandering phase First idea: split the computation into levels, store all links within the level. n 0 4 n 2 3 n 4 n

  52. Tradeoff analysis: Wandering phase Second idea: store everything that refers to the most expensive rows (keep a list). top-10% 1 R

  53. Tradeoff analysis: Wandering phase Third idea: note that rows are updated column-wise. Good for CPU cache, but even better for ASIC-equipped adversaries. • Store initial state of each row; • Compute new row columnwise; • So the extra latency is introduced before the first column only. Depth d Depth d Depth d f f f Delay d Delay 1 Delay 1

  54. Penalties Penalties: Setup phase Wandering phase ( T = 1 ) Memory fraction Penalty Memory fraction Penalty 1 1 1 . 5 2 2 2 1 1 2 6 . 6 4 4 1 1 3 111 . 7 8 8 1 1 2 16 4 16 16 When we combine two phases, we count how many intervals of length q are accessed at the Wandering phase. Total: Memory fraction Penalty 1 118 2 1 602 3 1 2241 4 1 14801 6

  55. Overall Catena, Argon, and Lyra2 tradeoffs for 1 GB: Memory fraction Penalty Catena-3 Argon Lyra2 ( T = 1 ) 1 7 . 4 180 118 2 1 2 29 . 5 11 . 2 602 3 1 2 34 15 . 5 2241 4 1 2 47 2 18 30 . 1 8

  56. Optimal ASIC implementations

  57. Password crackers History of password crackers: • 70-90s: regular desktops; • 00s: GPUs and FPGAs; • 10s: dedicated hardware?

  58. Password crackers History of password crackers: • 70-90s: regular desktops; • 00s: GPUs and FPGAs; • 10s: dedicated hardware? Let us figure out how a rich adversary would build his password cracker.

  59. ASIC ASIC (application-specific integrated chip) — dedicated hardware. • Large design costs (mln $); • Production costs high in small quantity; • The most energy-efficient systems.

  60. ASIC ASIC (application-specific integrated chip) — dedicated hardware. • Large design costs (mln $); • Production costs high in small quantity; • The most energy-efficient systems. When passwords are of high value, an adversary may want to design a password-cracking scheme. • Parallelism in computations; • Parallelism in memory access (very difficult for all other architectures); • In the long term electricity will dominate the costs. So let us minimize the energy needed to compute a single password.

  61. Straightforward implementation A straightforward implementation of a password hashing scheme typically has a huge memory block and a small computational core block . Memory Core

  62. Tradeoff implementation Less memory, more computations: Memory g g g g Extra core g g g Core

  63. Tradeoff implementation Less memory, more computations: Memory g g g g Extra core g g g Core Time may not grow: • If transformations are data-independent, they can be precomputed. Protection against cache-based timing attacks makes the scheme more vulnerable to tradeoff attacks. • Data-dependent transformations introduce some latency. However, at the other tree levels all data dependencies are known.

  64. Tradeoff evaluation What determines the cracking cost? The following metrics can be used: • Computational complexity (total number of operations). Rather easy to compute, but inaccurate for memory-hard functions. • Time × area. Good approximation to energy consumption if all the elements consume the same energy. Needs to know latencies and area requirements of all operations. • Energy cost . More relevant when idle memory, active memory, and logic consume different power (actual for static RAM). Needs to know energy requirements of all elements.

  65. Our assumptions So far no one has placed that much memory on a single ASIC, so the exact behaviour of such chip is unknown. We make the following assumptions: • Static RAM is more energy-efficient; • The memory can be partitioned into 2 16 banks (two levels of hierarchy); • All banks can be read and written in parallel with average latency of 3 cycles; • We ignore the area of communication wires between memory and computational cores. OK for our 2 16 memory banks and not so many cores, but can be a problem for much more dense structure.

  66. Energy model Energy model: static RAM power consumption E = LT + N M E M + N C E C total memory hash energy operations calls scheme hash call access time energy energy Three main contributors to the energy cost: • Leakage power of static RAM; • Memory access energy; • Hash computation energy.

  67. Reference platform We take the best implementations scale them to the reference platform: 65nm CMOS technology, 1.2V supply voltage, 400 MHz frequency. • AES: scaling down 22 nm, 1GHz implementation,1 cycle per round; • Blake2b: scaling up and doubling 90nm, 286 MHz implementation of Blake-32, 2 cycles per round; • Static RAM: 65nm, 850 MHz implementation;

  68. Reference platform Primitive Power Area Latency AES (full) 32 mW 17.5 kGE 10 Blake2b (full) 13.3 mW 19 kGE 20 16 KB – 32bit memory bank 12.6 µ W 192 kGE 3 Operation Energy 1 Gcall (2 30 ) of AES 800 mJ 1 Gcall of Blake2b 867 mJ 1 GB memory reads/writes 1 mJ Therefore, an AES core is equivalent to 700 bytes in area. One run of AES costs as much as reading 800 bytes.

Recommend


More recommend