simple high level code for cryptographic arithmetic with
play

Simple High-Level Code For Cryptographic Arithmetic With Proofs, - PowerPoint PPT Presentation

Simple High-Level Code For Cryptographic Arithmetic With Proofs, Without Compromises Andres Erbsen, Jade Philipoom, Jason Gross, Robert Sloan, Adam Chlipala MIT CSAIL g i t h u b . c o m / m i t - p l v / fi a t - c


  1. Simple High-Level Code For Cryptographic Arithmetic – With Proofs, Without Compromises Andres Erbsen, Jade Philipoom, Jason Gross, Robert Sloan, Adam Chlipala MIT CSAIL g i t h u b . c o m / m i t - p l v / fi a t - c r y p t o 1

  2. Finite Field Arithmetic ● Important for elliptic-curve cryptography – TLS, Signal, SSH... ● Performance-sensitive ● Hand-coded for each modulus, CPU word size – Widely implemented: P-256 and Curve25519 ● Persistent concerns about correctness 2

  3. 3

  4. In [OpenSSL multiplication modulo P-256] there are a number of comments saying "doesn't overflow". Unfortunately, they aren't correct. Got math wrong :-(. [fix] attached. [unclear if existing attacks can exploit this] [still wrong; counterexample] [...] Attached. A little bit worse performance on some CPUs It's good for ~6B random tests. [...] I think we can safely say that there aren't any low-hanging bugs left. 4

  5. In [OpenSSL multiplication modulo P-256] there are a number of comments saying "doesn't overflow". Unfortunately, they aren't correct. Got math wrong :-(. [fix] attached. [unclear if existing attacks can exploit this] [still wrong; counterexample] [...] Attached. A little bit worse performance on some CPUs It's good for ~6B random tests. [...] I think we can safely say that there aren't any low-hanging bugs left. 5

  6. In [OpenSSL multiplication modulo P-256] there are a number of comments saying "doesn't overflow". Unfortunately, they aren't correct. Got math wrong :-(. [fix] attached. [unclear if existing attacks can exploit this] [still wrong; counterexample] [...] Attached. A little bit worse performance on some CPUs It's good for ~6B random tests. [...] I think we can safely say that there aren't any low-hanging bugs left. 6

  7. In [OpenSSL multiplication modulo P-256] there are a number of comments saying "doesn't overflow". Unfortunately, they aren't correct. Got math wrong :-(. [fix] attached. [unclear if existing attacks can exploit this] [still wrong; counterexample] [...] Attached. A little bit worse performance on some CPUs It's good for ~6B random tests. [...] I think we can safely say that there aren't any low-hanging bugs left. 7

  8. In [OpenSSL multiplication modulo P-256] there are a number of comments saying "doesn't overflow". Unfortunately, they aren't correct. Got math wrong :-(. [fix] attached. [unclear if existing attacks can exploit this] [still wrong; counterexample] [...] Attached. A little bit worse performance on some CPUs It's good for ~6B random tests. [...] I think we can safely say that there aren't any low-hanging bugs left. 8

  9. Our Library ● Reusable, parametric implementations ● Automatically specialized to parameter values ● One computer-checkable correctness proof ● Deployed to billions of users with BoringSSL 9

  10. Our Library ● Reusable, parametric implementations ● Automatically specialized to parameter values ● One computer-checkable correctness proof ● Deployed to billions of users with BoringSSL 10

  11. demo push-button code generation (Curve25519 for 32-bit CPUs) 11

  12. 12

  13. 13

  14. 14

  15. 15

  16. 16

  17. 17

  18. 18

  19. 19

  20. 20

  21. 21

  22. 22

  23. 23

  24. Cycles / Curve225519 Operation 154982 ~750000 152195 121444 Generic (GMP) Specialized C Our library Specialized assembly 24 on a Broadwell laptop, as of time of submission

  25. Modulus-Specific Representations ● Important driver of specialized implementation ● Break one field element into multiple digits – m 2 5 5 o d 2 - 1 9 : 2 5 6 1 9 6 · 1 2 8 · 6 4 · x = 2 · x + 2 x + 2 x + 2 x + x 4 3 2 1 0 – m 1 2 7 o d 2 - 1 : · · · 1 2 7 8 5 4 3 x = 2 x 2 x + 2 x + x + 2 1 0 3 42 bits 42 43 ● Later: how to use this to speed up modular reduction 25

  26. Modulus-Specific Representations ● Important driver of specialized implementation ● Break one field element into multiple digits – m 2 5 5 o d 2 - 1 9 : 2 5 6 1 9 6 · 1 2 8 · 6 4 · x = 2 · x + 2 x + 2 x + 2 x + x 4 3 2 1 0 – m 1 2 7 o d 2 - 1 : · · · 1 2 7 8 5 4 3 x = 2 x 2 x + 2 x + x + 2 1 0 3 42 bits 42 43 ● Key challenge: generalizing algorithms across representations 26

  27. Our Algorithm-Centric Workflow Template Implementation Specification Let reduce s c p := let (lo, hi) Proof mulmod a b := := split s p in a * b mod m add lo (mul c hi). Parameter Micro- cc Selection Specialization optimization 27

  28. Focus of This Talk Template Implementation Specification Let reduce s c p := let (lo, hi) Proof mulmod a b := := split s p in a * b mod m add lo (mul c hi). Parameter Micro- cc Selection Specialization optimization 28

  29. Compile-time Associational Representation ● 8 · 1 7 6 = 8 · 1 0 2 + 7 · 1 0 + 6 ● Let example := [(10^2, 8); (10, 7); (1,6)]. ● Let eval ls := sum (map ( fun ‘(a,x)=> a*x ) ls). ● 8 7 6 = 4 · 2 0 0 + 5 · · + 1 6 · 1 1 0 + 1 1 0 ● Later: conversion to standard representation 29

  30. Compile-time Associational Representation ● 8 · 1 7 6 = 8 · 1 0 2 + 7 · 1 0 + 6 ● Let example := [(10^2, 8); (10, 7); (1,6)]. ● Let eval ls := sum (map ( fun ‘(a,x)=> a*x ) ls). ● 8 7 6 = 4 · 2 0 0 + 5 · · + 1 6 · 1 1 0 + 1 1 0 ● Later: conversion to standard representation 30

  31. Example: Schoolbook Multiplication a = [(100,3); (10,2); (1,1)] Definition mul (p q : list (Z*Z)) : list (Z*Z) := b = [(10,7); (1,6)] concat (map ( fun ‘(a, x) => 3 2 1 map ( fun ‘(b, y) => 18 12 6 6 (a*b, x*y) ) 21 14 7 7 q ) p). ab = [(100, 18); (10, 12); (1, 6); (1000,21);(100, 14);(10,7)] Lemma eval_map_mul a x q: eval (map (fun ‘(b, y)=>(a*b, x*y)) q)=a*x*eval q. Proof. induction q; push; nsatz. Qed. Hint Rewrite eval_map_mul : push. Lemma eval_mul : forall p q, eval (mul p q) = eval p * eval q. Proof. intros; induction p; cbv [mul]; push; nsatz. Qed. 31

  32. But Are These the Implementations We’re Looking For? ● Ahead-of-time specialization for performance! ● List lengths, digit weights are compile-time ● Evaluate, partially (grab a coffee while trying this at home) : – cbv -[blacklist] in (mul [(1,x);..] ..) 32

  33. Example Arithmetic Code Definition mul (p q : list (Z*Z)) : list (Z*Z) := concat (map ( fun ‘(a, x) => map ( fun ‘(b, y) => Annotated run-time operation (a*b, x*y ) ) q ) p). Lemma eval_map_mul a x q: eval (map (fun ‘(b, y)=>(a*b, x*y)) q)=a*x*eval q. Proof. induction q; push; nsatz. Qed. Hint Rewrite eval_map_mul : push. Lemma eval_mul : forall p q, eval (mul p q) = eval p * eval q. Proof. intros; induction p; cbv [mul]; push; nsatz. Qed. 33

  34. Partial Evaluation Example Eval cbv -[runtime_mul] in fun a0 a1 a2 b0 b1 b2 => mul [(1, a0); (10, a1); (100, a2)] [(1, b0); (10, b1); (100, b2)]. = fun a0 a1 a2 b0 b1 b2 => [ (1, a0*b0); (10, a0*b1); (100, a0*b2); (10, a1*b0); (100, a1*b1); (1000, a1*b2); (100, a2*b0); (1000, a2*b1); (10000, a2*b2)] ● Almost there; need to deduplicate the output list! fun a0 a1 a2 b0 b1 b2 => [(1, a0*b0); (10, a0*b1 + a1*b0); (100, a0*b2+a1*b1+a2*b0); (1000, a1*b2 + a2*b1); (10000, a2*b2)] 34

  35. Deduplication to Positional Repr. ● Run-time representation: fixed-length array → Assign each term to the correct slot ● With slots for [ , where does ( go? 1 , 1 0 , 1 0 0 ] 5 0 0 , x ) – Disallow? But proofs – Useful to handle for mixed-radix representations ● Verdict: to place ( , add 5 to the 1 s 5 0 0 , x ) · x 0 0 35

  36. Deduplication to Positional Repr. ● Run-time representation: fixed-length array → Assign each term to the correct slot ● With slots for [ , where does ( go? 1 , 1 0 , 1 0 0 ] 5 0 0 , x ) – Disallow? But proofs – Useful to handle for mixed-radix representations ● Verdict: to place ( , add 5 to the 1 s 5 0 0 , x ) · x 0 0 36

  37. Deduplication to Positional Repr. ● Run-time representation: fixed-length array → Assign each term to the correct slot ● With slots for [ , where does ( go? 1 , 1 0 , 1 0 0 ] 5 0 0 , x ) – Disallow? But proofs – Useful to handle for mixed-radix representations ● Verdict: to place ( , add 5 to the 1 s 5 0 0 , x ) · x 0 0 37

  38. Three Tricks for Modular Reduction ● Pseudo-Mersenne – m (c small) n t = 2 - c ● Solinas – m (c sparse) n t = 2 - c ● Mixed-radix – m n ( t / l ) = 2 - c – Curve25519 on 32-bit, 2004 ● One natural implementation will yield all 3! ● Key commonality: weight w s.t. w m o d m = c 38

  39. Three Tricks for Modular Reduction ● Pseudo-Mersenne – m (c small) n t = 2 - c ● Solinas – m (c sparse) n t = 2 - c ● Mixed-radix – m n ( t / l ) = 2 - c – Curve25519 on 32-bit, 2004 ● One natural implementation will yield all 3! ● Key commonality: weight 2 , 2 k k m o d m = c 39

Recommend


More recommend