learning strikes again the case of the drs signature
play

Learning Strikes Again: the Case of the DRS Signature Scheme Yang Yu - PowerPoint PPT Presentation

Learning Strikes Again: the Case of the DRS Signature Scheme Yang Yu 1 eo Ducas 2 L 1 Tsinghua University 2 Centrum Wiskunde & Informatica January 2019, London 1 / 42 This is a cryptanalysis work... Target: DRS a NIST lattice-based


  1. Message reduction algorithm Intuition: use s i to reduce w i decreases a lot for j � = i , w j increases a bit A reduction at i : w → w − q s i , q = ⌊ w i D ⌋ → 0 � � w − q s i � 1 = | w k − qs i , k | + | w i | − | q | · D ( q · w i > 0) k � = i � ≤ ( | w k | + | qs i , k | ) + | w i | − | q | · D k � = i � = � w � 1 − | q | · ( D − | s i , k | ) k � = i < � w � 1 (diagonal dominance) ⇒ message reduction always terminates! 13 / 42

  2. Resistance to NR attack The support of w : ( − D , D ) n DRS domain P ( S ) 14 / 42

  3. Resistance to NR attack The support of w : ( − D , D ) n DRS domain P ( S ) The support is “zero-knowledge” 14 / 42

  4. Resistance to NR attack The support of w : ( − D , D ) n DRS domain P ( S ) The support is “zero-knowledge”, but maybe the distribution is not! 14 / 42

  5. Outline 1 Background 2 DRS signature 3 Learning secret key coefficients 4 Exploiting the leaks 5 Countermeasures 15 / 42

  6. Intuition ( D,D ) w j w i ( − D, − D ) ( D,D ) ( D,D ) ( D,D ) w j w j w j w i w i w i ( − D, − D ) ( − D, − D ) ( − D, − D ) S i , j = − b S i , j = 0 S i , j = b 16 / 42

  7. Correlations Two sources of correlations between ( w i , w j ) reduction at i and S i , j � = 0 17 / 42

  8. Correlations Two sources of correlations between ( w i , w j ) reduction at i and S i , j � = 0 reduction at k and S k , i , S k , j � = 0 17 / 42

  9. Correlations Two sources of correlations between ( w i , w j ) reduction at i and S i , j � = 0 ⋆ reduction at k and S k , i , S k , j � = 0 17 / 42

  10. Correlations Two sources of correlations between ( w i , w j ) reduction at i and S i , j � = 0 ⋆ reduction at k and S k , i , S k , j � = 0 ⇒ S i , j should be strongly related to W i , j ( the distribution of ( w i , w j ) ) ! 17 / 42

  11. Figure out the model Can we devise a formula S i , j ≈ f ( W i , j ) ? 18 / 42

  12. Figure out the model Can we devise a formula S i , j ≈ f ( W i , j ) ? Seems complicated! cascading phenomenon: a reduction triggers another one. parasite correlations 18 / 42

  13. Figure out the model Can we devise a formula S i , j ≈ f ( W i , j ) ? Seems complicated! cascading phenomenon: a reduction triggers another one. parasite correlations ⇒ Search for the best linear fit f ? 18 / 42

  14. Figure out the model Can we devise a formula S i , j ≈ f ( W i , j ) ? Seems complicated! cascading phenomenon: a reduction triggers another one. parasite correlations ⇒ Search for the best linear fit f ? Search space for all linear f : too large! 18 / 42

  15. Figure out the model Can we devise a formula S i , j ≈ f ( W i , j ) ? Seems complicated! cascading phenomenon: a reduction triggers another one. parasite correlations ⇒ Search for the best linear fit f ? Search space for all linear f : too large! ⇒ choose some features { f i } and search in span( { f i } ) , i.e. f = � x ℓ f ℓ 18 / 42

  16. Training — feature selection Lower degree moments: f 2 ( W ) = E ( w i · | w i | 1 / 2 · w j ) f 1 ( W ) = E ( w i w j ) f 3 ( W ) = E ( w i · | w i | · w j ) 1 1 1 0.75 0.75 0.75 0.5 0.5 0.5 0.25 0.25 0.25 0.0 0.0 0.0 y y y -0.25 -0.25 -0.25 -0.5 -0.5 -0.5 -0.75 -0.75 -0.75 -1 -1 -1 -1 -0.75 -0.5 -0.25 0.0 0.25 0.5 0.75 1 -1 -0.75 -0.5 -0.25 0.0 0.25 0.5 0.75 1 -1 -0.75 -0.5 -0.25 0.0 0.25 0.5 0.75 1 x x x 19 / 42

  17. Training — feature selection Lower degree moments: f 2 ( W ) = E ( w i · | w i | 1 / 2 · w j ) f 1 ( W ) = E ( w i w j ) f 3 ( W ) = E ( w i · | w i | · w j ) 1 1 1 0.75 0.75 0.75 0.5 0.5 0.5 0.25 0.25 0.25 0.0 0.0 0.0 y y y -0.25 -0.25 -0.25 -0.5 -0.5 -0.5 -0.75 -0.75 -0.75 -1 -1 -1 -1 -0.75 -0.5 -0.25 0.0 0.25 0.5 0.75 1 -1 -0.75 -0.5 -0.25 0.0 0.25 0.5 0.75 1 -1 -0.75 -0.5 -0.25 0.0 0.25 0.5 0.75 1 x x x Not enough! 19 / 42

  18. Training — feature selection ( D,D ) w j w i ( − D, − D ) ( D,D ) ( D,D ) ( D,D ) w j w j w j w i w i w i ( − D, − D ) ( − D, − D ) ( − D, − D ) S i , j = − b S i , j = 0 S i , j = b 20 / 42

  19. Training — feature selection Pay more attention to the central region (i.e. | w i | small). f 4 = E ( w i ( w i − 1)( w i + 1) w j ) f 5 = E (2 w i (2 w i − 1)(2 w i + 1) w j | | 2 w i | ≤ 1) 1 1 0.75 0.75 0.5 0.5 0.25 0.25 0.0 0.0 y y -0.25 -0.25 -0.5 -0.5 -0.75 -0.75 -1 -1 -1 -0.75 -0.5 -0.25 0.0 0.25 0.5 0.75 1 -1 -0.75 -0.5 -0.25 0.0 0.25 0.5 0.75 1 x x f 6 = E (4 w i (4 w i − 1)(4 w i + 1) w j | | 4 w i | ≤ 1) f 7 = E (8 w i (8 w i − 1)(8 w i + 1) w j | | 8 w i | ≤ 1) 1 1 0.75 0.75 0.5 0.5 0.25 0.25 0.0 0.0 y y -0.25 -0.25 -0.5 -0.5 -0.75 -0.75 -1 -1 -1 -0.75 -0.5 -0.25 0.0 0.25 0.5 0.75 1 -1 -0.75 -0.5 -0.25 0.0 0.25 0.5 0.75 1 x x 21 / 42

  20. Training — feature selection Pay more attention to the central region (i.e. | w i | small). f 4 = E ( w i ( w i − 1)( w i + 1) w j ) f 5 = E (2 w i (2 w i − 1)(2 w i + 1) w j | | 2 w i | ≤ 1) 1 1 0.75 0.75 0.5 0.5 0.25 0.25 0.0 0.0 y y -0.25 -0.25 -0.5 -0.5 -0.75 -0.75 -1 -1 -1 -0.75 -0.5 -0.25 0.0 0.25 0.5 0.75 1 -1 -0.75 -0.5 -0.25 0.0 0.25 0.5 0.75 1 x x f 6 = E (4 w i (4 w i − 1)(4 w i + 1) w j | | 4 w i | ≤ 1) f 7 = E (8 w i (8 w i − 1)(8 w i + 1) w j | | 8 w i | ≤ 1) 1 1 0.75 0.75 0.5 0.5 0.25 0.25 0.0 0.0 y y -0.25 -0.25 -0.5 -0.5 -0.75 -0.75 -1 -1 -1 -0.75 -0.5 -0.25 0.0 0.25 0.5 0.75 1 -1 -0.75 -0.5 -0.25 0.0 0.25 0.5 0.75 1 x x Together with transposes (i.e. f t ( w i , w j ) = f ( w j , w i )), we finally selected 7 × 2 − 1 = 13 features in experiments. 21 / 42

  21. Training — model construction S i , j seems easier to learn when ( i − j mod n ) is smaller. 22 / 42

  22. Training — model construction S i , j seems easier to learn when ( i − j mod n ) is smaller. f + = � x + f ℓ , f − = � x − f ℓ according to ( i − j mod n ). 22 / 42

  23. Training — model construction S i , j seems easier to learn when ( i − j mod n ) is smaller. f + = � x + f ℓ , f − = � x − f ℓ according to ( i − j mod n ). 22 / 42

  24. Training — model construction S i , j seems easier to learn when ( i − j mod n ) is smaller. f + = � x + f ℓ , f − = � x − f ℓ according to ( i − j mod n ). Build models by least-square fit method 30 instances and 400 000 samples per instances 38 core-hours 22 / 42

  25. Training — model construction S i , j seems easier to learn when ( i − j mod n ) is smaller. f + = � x + f ℓ , f − = � x − f ℓ according to ( i − j mod n ). Build models by least-square fit method 30 instances and 400 000 samples per instances 38 core-hours Possible improvements advanced machine learning techniques 22 / 42

  26. Training — model construction S i , j seems easier to learn when ( i − j mod n ) is smaller. f + = � x + f ℓ , f − = � x − f ℓ according to ( i − j mod n ). Build models by least-square fit method 30 instances and 400 000 samples per instances 38 core-hours Possible improvements advanced machine learning techniques more blocks 22 / 42

  27. Training — model construction S i , j seems easier to learn when ( i − j mod n ) is smaller. f + = � x + f ℓ , f − = � x − f ℓ according to ( i − j mod n ). Build models by least-square fit method 30 instances and 400 000 samples per instances 38 core-hours Possible improvements advanced machine learning techniques more blocks new features 22 / 42

  28. The models f − f + 1 1 0.75 0.75 0.5 0.5 0.25 0.25 0.0 0.0 y y -0.25 -0.25 -0.5 -0.5 -0.75 -0.75 -1 -1 -1 -0.75 -0.5 -0.25 0.0 0.25 0.5 0.75 1 -1 -0.75 -0.5 -0.25 0.0 0.25 0.5 0.75 1 x x 23 / 42

  29. Learning Let’s learn a new S as S ′ = f ( W )! f − f + S i,j = b S i,j = b 0.30 S i,j = − b S i,j = − b 0.20 S i,j =1 S i,j =1 0.25 S i,j = − 1 S i,j = − 1 S i,j =0 S i,j =0 0.15 0.20 0.15 0.10 0.10 0.05 0.05 0.00 0.00 −20 −10 0 10 20 −10 −5 0 5 10 24 / 42

  30. Learning Let’s learn a new S as S ′ = f ( W )! 0.35 S i,j = b S i,j = − b 0.30 S i,j =1 probability density S i,j = − 1 0.25 S i,j =0 0.20 0.15 0.10 0.05 0.00 10 5 0 5 10 f 24 / 42

  31. Learning Let’s learn a new S as S ′ = f ( W )! 0.35 S i,j = b S i,j = − b 0.30 S i,j =1 probability density S i,j = − 1 0.25 S i,j =0 0.20 0.15 0.10 0.05 0.00 10 5 0 5 10 f 24 / 42

  32. Learning — location S = D · I + is “absolute circulant” ⇒ more confidence via diagonal amplification 25 / 42

  33. Learning — location The weight of k -th diagonal W k = � S ′ 2 i , i + k 1.4 Large coefficient W + k 1.2 W − k 1.0 0.8 0.6 0.4 0.2 0.0 0 200 400 600 800 k 25 / 42

  34. Learning — location #signatures 13 / 16 14 / 16 15 / 16 16 / 16 50 000 5 3 6 6 100 000 - - - 20 200 000 - - - 20 400 000 - - - 20 Table: Location accuracy. The column, labeled by K / 16, shows the number of tested instances in which the largest N b scaled weights corresponded to exactly K large coefficient diagonals. 25 / 42

  35. Learning — location #signatures 13 / 16 14 / 16 15 / 16 16 / 16 50 000 5 3 6 6 100 000 - - - 20 200 000 - - - 20 400 000 - - - 20 Table: Location accuracy. The column, labeled by K / 16, shows the number of tested instances in which the largest N b scaled weights corresponded to exactly K large coefficient diagonals. We locate all large coefficients successfully! 25 / 42

  36. Learning — location #signatures 13 / 16 14 / 16 15 / 16 16 / 16 50 000 5 3 6 6 100 000 - - - 20 200 000 - - - 20 400 000 - - - 20 Table: Location accuracy. The column, labeled by K / 16, shows the number of tested instances in which the largest N b scaled weights corresponded to exactly K large coefficient diagonals. We locate all large coefficients successfully! but we are still missing the signs! 25 / 42

  37. Learning — sign S i , j ∈ {± b , ± 1 , 0 } 0.35 S i,j = b S i,j = − b 0.30 S i,j =1 probability density S i,j = − 1 0.25 S i,j =0 0.20 0.15 0.10 0.05 0.00 10 5 0 5 10 f 26 / 42

  38. Learning — sign S i , j ∈ {± b } 0.35 S i,j = b S i,j = − b 0.30 probability density 0.25 0.20 0.15 0.10 0.05 0.00 10 5 0 5 10 f 26 / 42

  39. Learning — sign #signatures p l p u p p row 400 000 0.9975 0.9939 0.9956 0.9323 200 000 0.9920 0.9731 0.9826 0.7546 100 000 0.9722 0.9330 0.9536 0.4675 50 000 0.9273 0.8589 0.8921 0.1608 Table: Experimental measures for p l , p u , p and p row . p = accuracy of guessing the sign of a large coefficient p l = accuracy for a large coefficient in the lower triangle p u = accuracy for a large coefficient in the upper triangle p row = p N b 26 / 42

  40. Learning — sign #signatures p l p u p p row 400 000 0.9975 0.9939 0.9956 0.9323 200 000 0.9920 0.9731 0.9826 0.7546 100 000 0.9722 0.9330 0.9536 0.4675 50 000 0.9273 0.8589 0.8921 0.1608 Table: Experimental measures for p l , p u , p and p row . p = accuracy of guessing the sign of a large coefficient p l = accuracy for a large coefficient in the lower triangle p u = accuracy for a large coefficient in the upper triangle p row = p N b We can determine all large coefficients in one row! 26 / 42

  41. Learning — sign #signatures p l p u p p row 400 000 0.9975 0.9939 0.9956 0.9323 200 000 0.9920 0.9731 0.9826 0.7546 100 000 0.9722 0.9330 0.9536 0.4675 50 000 0.9273 0.8589 0.8921 0.1608 Table: Experimental measures for p l , p u , p and p row . p = accuracy of guessing the sign of a large coefficient p l = accuracy for a large coefficient in the lower triangle p u = accuracy for a large coefficient in the upper triangle p row = p N b We can determine all large coefficients in one row! However, it is still hard to learn small coefficients... 26 / 42

  42. Outline 1 Background 2 DRS signature 3 Learning secret key coefficients 4 Exploiting the leaks 5 Countermeasures 27 / 42

  43. BDD & uSVP BDD (Bounded Distance Decoding) Given a lattice L and a target t “very close” to L , to find v ∈ L minimizing � v − t � . uSVP (Unique SVP) Given a lattice L with λ 1 ( L ) ≪ λ 2 ( L ), to find its shortest non-zero vector. 28 / 42

  44. BDD & uSVP BDD (Bounded Distance Decoding) Given a lattice L and a target t “very close” to L , to find v ∈ L minimizing � v − t � . uSVP (Unique SVP) Given a lattice L with λ 1 ( L ) ≪ λ 2 ( L ), to find its shortest non-zero vector. � B � BDD ⇒ uSVP on L ′ spanned by t 1 28 / 42

  45. BDD & uSVP BDD (Bounded Distance Decoding) Given a lattice L and a target t “very close” to L , to find v ∈ L minimizing � v − t � . uSVP (Unique SVP) Given a lattice L with λ 1 ( L ) ≪ λ 2 ( L ), to find its shortest non-zero vector. � B � BDD ⇒ uSVP on L ′ spanned by t 1 λ 1 ( L ′ ) = � 1 + dist( t , L ) 2 vol( L ′ ) = vol( L ) 28 / 42

  46. Solving uSVP by BKZ Required blocksize β 1 β/ d · λ 1 ( L ′ ) ≤ δ 2 β − d � · vol( L ′ ) [ADPS16, AGVW17]: d β 1 1 � � 2( β − 1) β β ( πβ ) where d = dim( L ′ ), δ β ≈ ( β > 50). 2 π e 29 / 42

  47. Solving uSVP by BKZ Required blocksize β 1 β/ d · λ 1 ( L ′ ) ≤ δ 2 β − d � · vol( L ′ ) [ADPS16, AGVW17]: d β 1 1 � � 2( β − 1) β β ( πβ ) where d = dim( L ′ ), δ β ≈ ( β > 50). 2 π e Cost of BKZ- β [Che13, Alb17]: C BKZ- β = 16 d · C SVP- β 29 / 42

  48. Solving uSVP by BKZ Required blocksize β 1 β/ d · λ 1 ( L ′ ) ≤ δ 2 β − d � · vol( L ′ ) [ADPS16, AGVW17]: d β 1 1 � � 2( β − 1) β β ( πβ ) where d = dim( L ′ ), δ β ≈ ( β > 50). 2 π e Cost of BKZ- β [Che13, Alb17]: C BKZ- β = 16 d · C SVP- β Cost of solving SVP- β Enum[APS15]: 2 0 . 270 β ln β − 1 . 019 β +16 . 10 29 / 42

  49. Solving uSVP by BKZ Required blocksize β 1 β/ d · λ 1 ( L ′ ) ≤ δ 2 β − d � · vol( L ′ ) [ADPS16, AGVW17]: d β 1 1 � � 2( β − 1) β β ( πβ ) where d = dim( L ′ ), δ β ≈ ( β > 50). 2 π e Cost of BKZ- β [Che13, Alb17]: C BKZ- β = 16 d · C SVP- β Cost of solving SVP- β Enum[APS15]: 2 0 . 270 β ln β − 1 . 019 β +16 . 10 Sieve [Duc17]: 2 0 . 396 β +8 . 4 29 / 42

  50. Solving uSVP by BKZ Required blocksize β 1 β/ d · λ 1 ( L ′ ) ≤ δ 2 β − d � · vol( L ′ ) [ADPS16, AGVW17]: d β 1 1 � � 2( β − 1) β β ( πβ ) where d = dim( L ′ ), δ β ≈ ( β > 50). 2 π e Cost of BKZ- β [Che13, Alb17]: C BKZ- β = 16 d · C SVP- β Cost of solving SVP- β Enum[APS15]: 2 0 . 270 β ln β − 1 . 019 β +16 . 10 ⋆ Sieve [Duc17]: 2 0 . 396 β +8 . 4 29 / 42

  51. Leaks help a lot! Attack without leaks b 2 · N b + N 1 + 1 d = n + 1, λ 1 ( L ′ ) = � cost: > 2 128 30 / 42

  52. Leaks help a lot! Attack without leaks � b 2 · N b + N 1 + 1 d = n + 1, λ 1 ( L ′ ) = cost: > 2 128 Naive attack with leaks d = n + 1, λ 1 ( L ′ ) = √ N 1 + 1 cost: 2 78 30 / 42

  53. Leaks help a lot! Attack without leaks b 2 · N b + N 1 + 1 d = n + 1, λ 1 ( L ′ ) = � cost: > 2 128 Naive attack with leaks d = n + 1 , λ 1 ( L ′ ) = √ N 1 + 1 cost: 2 78 Improved attack with leaks d = n − N b , λ 1 ( L ′ ) = √ N 1 + 1 cost: 2 73 30 / 42

  54. Improved BDD-uSVP attack Red: D , ± b (known) , Blue: 0 , ± 1 (unknown) t = 0 0 0 0 0 · · · 0 0 0 s k = · · · 31 / 42

  55. Improved BDD-uSVP attack   ∗ ∗ ∗ 1 ∗ ∗ Let H =  be HNF ( L ), and s = cH   . . ...  . . . . 1 ∗ ∗ t = 0 0 0 0 0 · · · 0 0 0 s k = · · · c = · · · 31 / 42

  56. Improved BDD-uSVP attack   ∗ ∗ ∗ 1 ∗ ∗ Let H =  be HNF ( L ), and s = cH   . . ...  . . . . 1 ∗ ∗ t = 0 0 0 0 0 · · · 0 0 0 s k = · · · c = · · · Let M such that tM = 0 0 · · · 0 = ( 0 , r ) s k M = = ( b , r ) cM = = ( p , r ) 31 / 42

  57. Improved BDD-uSVP attack � H ′ � Let M t HM = and H ′′ I H ′ � � let L ′ be the lattice spanned by t ′ = rH ′′ 1 32 / 42

  58. Improved BDD-uSVP attack � H ′ � Let M t HM = and H ′′ I H ′ � � let L ′ be the lattice spanned by t ′ = rH ′′ 1 dim( L ′ ) = n − N b vol( L ′ ) = vol( L ) λ 1 ( L ′ ) = � ( b , 1) � = √ N 1 + 1 32 / 42

  59. Improved BDD-uSVP attack Once one s i is recovered exactly ⇒ all 0’s in S are determined tM = 0 0 · · · 0 s k M = cM = dim = n − N b 33 / 42

  60. Improved BDD-uSVP attack Once one s i is recovered exactly ⇒ all 0’s in S are determined tM = 0 0 tM = 0 0 · · · 0 s k M = s k M = cM = cM = dim = n − N b dim = N 1 + N b + 1 ≈ n / 2 33 / 42

  61. Improved BDD-uSVP attack Once one s i is recovered exactly ⇒ all 0’s in S are determined tM = 0 0 tM = 0 0 · · · 0 s k M = s k M = cM = cM = dim = n − N b dim = N 1 + N b + 1 ≈ n / 2 Recovering secret matrix ≈ recovering a first secret. 33 / 42

  62. Improved BDD-uSVP attack Once one s i is recovered exactly ⇒ all 0’s in S are determined tM = 0 0 tM = 0 0 · · · 0 s k M = s k M = cM = cM = dim = n − N b dim = N 1 + N b + 1 ≈ n / 2 Recovering secret matrix ≈ recovering a first secret. Can we do better with the help of many t k close to s k ? [KF17] 33 / 42

  63. Conclusion We present a statistical attack against DRS: given 100 000 signatures, security is below 80-bits; even less with the current progress of lattice algorithms. 34 / 42

  64. Outline 1 Background 2 DRS signature 3 Learning secret key coefficients 4 Exploiting the leaks 5 Countermeasures 35 / 42

  65. Modified DRS In DRS: S = D · I + E is diagonal-dominant Version 1 [PSDS17] absolute circulant, E i , i = 0 three types of coefficients ( { 0 } , {± 1 } , {± b } ) with fixed numbers 36 / 42

  66. Modified DRS In DRS: S = D · I + E is diagonal-dominant Version 2 [PSDS18] Version 1 [PSDS17] $ absolute circulant, E i , i = 0 e 1 , · · · , e n ← − { v | � v � 1 < D } three types of coefficients variable diagonal elements ( { 0 } , {± 1 } , {± b } ) with fixed numbers 36 / 42

  67. Modified DRS In DRS: S = D · I + E is diagonal-dominant Version 2 [PSDS18] Version 1 [PSDS17] $ absolute circulant, E i , i = 0 e 1 , · · · , e n ← − { v | � v � 1 < D } three types of coefficients variable diagonal elements ( { 0 } , {± 1 } , {± b } ) with fixed numbers Impact no circulant structure ⇒ diagonal amplification doesn’t work 36 / 42

  68. Modified DRS In DRS: S = D · I + E is diagonal-dominant Version 2 [PSDS18] Version 1 [PSDS17] $ absolute circulant, E i , i = 0 e 1 , · · · , e n ← − { v | � v � 1 < D } three types of coefficients variable diagonal elements ( { 0 } , {± 1 } , {± b } ) with fixed numbers Impact no circulant structure ⇒ diagonal amplification doesn’t work coefficients are less sparsely distributed ⇒ less confidence of guessing 36 / 42

  69. Learning attack on modified DRS We regard S i , j as a random variable following the same distribution. Let S ′ be the guess of S and N be the sample size. 37 / 42

  70. Learning attack on modified DRS We regard S i , j as a random variable following the same distribution. Let S ′ be the guess of S and N be the sample size. As N grows, we hope Var ( S i , j − S ′ i , j ) < Var ( S i , j ) ⇒ more confidence of guessing � s i − s ′ i � < � s i � ⇒ guessing vector gets closer to the lattice 37 / 42

Recommend


More recommend