How Fast Can Higher-Order Masking Be in Software? Dahmun Goudarzi - PowerPoint PPT Presentation

How Fast Can Higher-Order Masking Be in Software? Dahmun Goudarzi and Matthieu Rivain EUROCRYPT 2017, Paris

1 � Introduction 2 � Field Multiplications 3 � Non-Linear Operations 4 � Generic Polynomial Methods 5 � Polynomial Methods for AES 6 � The Bitslice Strategy 2/32

Higher-Order Masking x = x 1 + x 2 + · · · + x d 3/32

Higher-Order Masking x = x 1 + x 2 + · · · + x d � Linear operations: O ( d ) 3/32

Higher-Order Masking x = x 1 + x 2 + · · · + x d � Linear operations: O ( d ) � Non-linear operations: O ( d 2 ) 3/32

Higher-Order Masking x = x 1 + x 2 + · · · + x d � Linear operations: O ( d ) � Non-linear operations: O ( d 2 ) � Challenge for blockciphers: S-boxes 3/32

Ishai-Sahai-Wagner Multiplication � c i = � � × � � � � � a i b i = a i × b j i i i i,j       a 1 b 1 a 1 b 2 . . . a 1 b d 0 0 . . . 0 0 r 1 , 2 . . . r 1 ,d . . . . . . 0 a 2 b 2 . . . . a 2 b 1 0 . . . . r 1 , 2 0 . . . .        +  +       . . . . . . ... . . . . . .       . . . . . . r d,d − 1     0 0 . . . a d b d a d b 1 a d b 2 . . . 0 r 1 ,d r d,d − 1 0 4/32

The Polynomial Methods � Sbox seen as a polynomial over GF (2 n ) n � a i x i S ( x ) = i =0 5/32

The Polynomial Methods � Sbox seen as a polynomial over GF (2 n ) n � a i x i S ( x ) = � i =0 Generic Methods � S ( x ) = ( p i ⋆ q i )( x ) i � CRV decomposition, ⋆ = × (CHES 2014) � Algebraic decomposition, ⋆ = ◦ (CRYPTO 2015) 5/32

The Polynomial Methods � Sbox seen as a polynomial over GF (2 n ) n � a i x i S ( x ) = � i =0 � Generic Methods AES Specific Methods � S AES ( x ) = Aff ( x 254 ) S ( x ) = ( p i ⋆ q i )( x ) i � CRV decomposition, ⋆ = × (CHES 2014) � RP multiplication chain (CHES 2010) � Algebraic decomposition, ⋆ = ◦ (CRYPTO 2015) � KHL multiplication chain (CHES 2011) 5/32

Our results � Optimized implementations of state of the art higher-order masking techniques � Bottom-up approach: ◮ base field multiplication ◮ ISW/CPRR ◮ polynomial methods � Finely tuned ARM assembly (parallelization) � Alternative strategy: bitslice method (new AES and PRESENT speed records) 6/32

ARM � 32-bit architecture with 16 registers (13 user accessible register) � Barrelshifter: shifts and rotates virtually free � Example: x -times and add on GF(2)[ x ] in 1 cycle EOR $acc , $var , $acc , LSL #1 7/32

Field Multiplication � Goal: efficient implementation of multiplication over GF(2 n ) � Fastest method: precomputed look-up table � Limitation: constrained memory on embedded system n 4 5 6 7 8 9 10 Table size 0.25 kiB 1 kiB 4 kiB 16 kiB 64 kiB 512 kiB 2048 kiB 9/32

Field Multiplication bin mult v1 bin mult v2 exp-log v1 exp-log v2 kara. half-tab full-tab clock cycles 10 n + 3 7 n + 3 18 16 19 10 4 registers 5 5 5 5 6 5 5 2 n − 1 + 48 2 n +1 + 48 3 · 2 n + 40 3 · 2 n + 42 2 +1 + 24 2 2 n + 12 3 n code size 52 2 10/32

Field Multiplication bin mult v1 bin mult v2 exp-log v1 exp-log v2 kara. half-tab full-tab clock cycles 10 n + 3 7 n + 3 18 16 19 10 4 registers 5 5 5 5 6 5 5 2 n − 1 + 48 2 n +1 + 48 3 · 2 n + 40 3 · 2 n + 42 2 +1 + 24 2 2 n + 12 3 n code size 52 2 n 2 + a ℓ ) × ( b h x n 2 + b ℓ ) a × b = ( a h x Karatsuba = T1[ a h | b h ] + T2[ a ℓ | b ℓ ] + T3[ a h + a ℓ | b h + b ℓ ] 10/32

Field Multiplication bin mult v1 bin mult v2 exp-log v1 exp-log v2 kara. half-tab full-tab clock cycles 10 n + 3 7 n + 3 18 16 19 10 4 registers 5 5 5 5 6 5 5 2 n − 1 + 48 2 n +1 + 48 3 · 2 n + 40 3 · 2 n + 42 2 +1 + 24 2 2 n + 12 3 n code size 52 2 n 2 + a ℓ ) × ( b h x n 2 + b ℓ ) a × b = ( a h x Half table = T1[ a h | a ℓ | b h ] + T2[ a h | a ℓ | b ℓ ] 10/32

Field Multiplication bin mult v1 bin mult v2 exp-log v1 exp-log v2 kara. half-tab full-tab clock cycles 10 n + 3 7 n + 3 18 16 19 10 4 registers 5 5 5 5 6 5 5 code size 52 56 B 80 B 88 B 90 B 152 B 268 B � For n = 4 : full table ◮ Fastest multiplication: 4 clock cycles ◮ Low code size: 268 B 10/32

Field Multiplication bin mult v1 bin mult v2 exp-log v1 exp-log v2 kara. half-tab full-tab clock cycles 10 n + 3 7 n + 3 18 16 19 10 4 registers 5 5 5 5 6 5 5 code size 52 176 B 560 B 808 B 810 B 8216 B 64 kiB � For n = 8 : exp-log or half-tab ◮ tradeoff between clock cycles and code size 10/32

Quadratic Operations � ISW ◮ Secure GF-mult of 2 operands ◮ Might need refreshing (see paper for details) � CPRR ◮ Evaluation of quadratic functions in 1 operand ◮ Similar to ISW: GF-mult � lookup tables ◮ Twice more random 12/32

Performances Comparisons 3 , 500 ISW-FT ISW-HT 3 , 000 ISW-EL 2 , 500 CPRR Clock Cycles 2 , 000 1 , 500 1 , 000 500 0 d = 3 d = 5 d = 10 � ISW < CPRR when table too huge � Asymptotical comp: 1 CPRR � 1.16 ISW-FT, 0.88 ISW-HT, 0.75 ISW-EL 13/32

Parallelization � 32-bit register filled with only n -bit elements � Perform several ISW/CPRR in parallel: ◮ n = 4 � 8 elements/register ◮ n = 8 � 4 elements/register � Consequence: ◮ Parallel: load, store, xor, loops ◮ Sequential: GF mult, CPRR lookups 14/32

Performances Gain of Parallelization � n = 8 (4 elements) � n = 4 (8 elements) ISW-HT ISW-FT ISW-EL CPRR 15 , 000 15 , 000 CPRR sequential Clock Cycles Clock Cycles sequential parallel parallel 10 , 000 10 , 000 5 , 000 5 , 000 0 0 d = 3 d = 5 d = 10 d = 3 d = 5 d = 10 � Asympt. ratio: CPRR 54% . � Asympt. ratio: ISW 42% . 15/32

Polynomial Decomposition S ( x ) = � i q i ( x ) ⋆ p i ( x ) 17/32

Polynomial Decomposition S ( x ) = � i q i ( x ) ⋆ p i ( x ) � q i : random linear combinations from a basis B 17/32

Polynomial Decomposition S ( x ) = � i q i ( x ) ⋆ p i ( x ) � q i : random linear combinations from a basis B � find p i by solving a linear system 17/32

Polynomial Decomposition S ( x ) = � i q i ( x ) ⋆ p i ( x ) � q i : random linear combinations from a basis B � find p i by solving a linear system � CRV vs AD: ◮ CRV [CRV14]: ⋆ = GF-multiplication � ISW multiplication ◮ AD [CPRR15]: ⋆ = composition � CPRR evaluation 17/32

CRV Improvement � Use CPRR for the basis computation � Example for n = 8 : This paper CRV x 3 = x 3 x 3 = x · x 2 x 9 = ( x 3 ) 3 x 7 = x · ( x 3 ) 2 x 5 = x 5 x 29 = x · ( x 7 ) 4 x 25 = ( x 5 ) 5 x 87 = x 3 · x 29 x 125 = ( x 25 ) 5 x 251 = ( x 6 ) 16 · ( x 87 ) 128 x 115 = ( x 125 ) 5 5 ISW 6 CPRR 18/32

Implementation Results � n = 4 (8 s-boxes in / � n = 8 (4 s-boxes in / / ) / ) 3 , 000 Alge. dec. Alge. dec. 800 CRV-FT CRV-HT 2 , 500 CRV-EL Clock Cycles × 10 2 Clock Cycles × 10 600 2 , 000 1 , 500 400 1 , 000 200 500 0 0 d = 3 d = 5 d = 10 d = 3 d = 5 d = 10 19/32

Polynomial Methods for AES � Based on the specific algebraic structure of the AES: S ( x ) = Aff( x 254 ) � RP10 method : 4 ISW mult � Security flaw due to refreshing � Patch [CPRR13]: 1 CPRR + 3 ISW � Improvement [GPS14]: 3 CPRR + 1 ISW � KHL11 method: 5 ISW mult on GF(16) � Patch [this paper]: 1 CPRR + 4 ISW 21/32

Implementation Results � 16 s-boxes in / / KHL 100 RP-HT RP-EL Clock Cycles × 10 3 80 60 40 20 0 d = 3 d = 5 d = 10 � KHL < RP- ∗ : smaller elements � higher parallelization degree 22/32

Bitslice for the AES � Sbox seen as boolean circuit X 1 X 2 X n x 1 x 2 . . . x n . . . . . . . . . � + + CPU CPU XOR XOR . . . . . . + CPU AND � 16 S-boxes in / / 24/32

Application for AES S-boxes � Circuit for the AES S-box [BMP13] ◮ 83 XOR gates ◮ 32 AND gates � Bitslice (16 s-boxes) ◮ 83 XOR instructions ◮ 32 AND instructions � Masking at the order d : ◮ 83 × d XOR instructions ◮ 32 ISW-AND 25/32

Improvement 2 16-bit ISW-AND � 1 32-bit ISW-AND � Goal: grouping AND gates per pairs � Validation on BMP circuit � 16 s-boxes = 16 ISW-AND � 1 ISW-AND per s-box 26/32

Performance Comparison of ISW 8 , 000 ISW-AND (32 / / AND) ISW-FT (8 / / GF(16)-mult) ISW-HT (4 / / GF(256)-mult) 6 , 000 Clock Cycles 4 , 000 2 , 000 0 d = 3 d = 5 d = 10 27/32

How Fast Can Higher-Order Masking Be in Software? Dahmun Goudarzi - PowerPoint PPT Presentation

How Fast Can Higher-Order Masking Be in Software? Dahmun Goudarzi and Matthieu Rivain EUROCRYPT 2017, Paris 1 Introduction 2 Field Multiplications 3 Non-Linear Operations 4 Generic Polynomial Methods 5 Polynomial Methods for

On the Multiplicative Complexity of Boolean Functions and Bitsliced Higher-Order Masking Dahmun

Provably Secure Higher-Order Masking of AES Matthieu Rivain Emmanuel Prouff CryptoExperts

High Order Masking of Look-up Tables with Common Shares J-S.Coron, F.Rondepierre, R.Zeitoun 12th

Proposal to add masking function to GFM Proposal part 1 Adding a masking reference attribute on

Higher-Order Masking Schemes for S-boxes Matthieu Rivain Joint work with C. Carlet, L. Goubin,

Side Channel Cryptanalysis of a Higher Order Schemes Generic Masking Scheme Scheme Improved

Very High-Order Masking: Efficient Implementation and Security Evaluation Anthony Journault and

Masking schemes: evaluation Oscar Reparaz COSIC/KU Leuven PROOFS Taipei (Taiwan)

Understanding the Masking-Shadowing Function INRIA ; CNRS ; Univ. Grenoble Alpes in

Higher order complexity Hugo Fre Mathieu Hoyrup CCA 2013 Hugo Fre Higher order

Formal Analysis of the Entropy / Security Trade-off in First-Order Masking Countermeasures

York University www.cs.york.ac.uk/~ndm First order vs Higher order Higher order:

Disclosures Masking and Breast Density Volpare Health Solutions (Wellington, New Zealand) Can we

Being a METS Startup Fast Failure; Fast Reward November 2016 Fast Failure; Fast Reward

Higher order Ambisonics Higher order Ambisonics A future-proof 3D audio technique A future-proof

Higher Order Functions 1 Shell CSCE 314 TAMU Higher-order Functions A function is called

De-biasing arbitrary convex regularizers and asymptotic normality Pierre C Bellec, Rutgers

IEEE IAS Technical Books Coordinating Committee (IAS/TBCC) 2013 I&CPS Lisa Perry IEEE-SA

New matrix norms for structured matrix estimation Jean-Philippe Vert Optimization and Statistical

Lattice Points in Polytopes Richard P. Stanley U. Miami & M.I.T. A lattice polygon Georg

Theoretical Analysis of Adversarial Learning: A Minimax Approach Zhuozhuo Tu 1 , Jingwei Zhang 2,1

IPv6 Multicast Over TEIN Pujan Srivastava | pujan@ait.asia Asian Institute of Technology

Cool gas inside early-type galaxies galaxies Timothy A.

Asteroid color photometry with Gaia and synergies with other space missions Marco Delbo

Explore More Topics

Sambuz

Useful Links

Newsletter

Mail Us

How Fast Can Higher-Order Masking Be in Software? Dahmun Goudarzi - PowerPoint PPT Presentation

How Fast Can Higher-Order Masking Be in Software? Dahmun Goudarzi and Matthieu Rivain EUROCRYPT 2017, Paris 1 Introduction 2 Field Multiplications 3 Non-Linear Operations 4 Generic Polynomial Methods 5 Polynomial Methods for

On the Multiplicative Complexity of Boolean Functions and Bitsliced Higher-Order Masking Dahmun

Provably Secure Higher-Order Masking of AES Matthieu Rivain Emmanuel Prouff CryptoExperts

High Order Masking of Look-up Tables with Common Shares J-S.Coron, F.Rondepierre, R.Zeitoun 12th

Proposal to add masking function to GFM Proposal part 1 Adding a masking reference attribute on

Higher-Order Masking Schemes for S-boxes Matthieu Rivain Joint work with C. Carlet, L. Goubin,

Side Channel Cryptanalysis of a Higher Order Schemes Generic Masking Scheme Scheme Improved

Very High-Order Masking: Efficient Implementation and Security Evaluation Anthony Journault and

Masking schemes: evaluation Oscar Reparaz COSIC/KU Leuven PROOFS Taipei (Taiwan)

Understanding the Masking-Shadowing Function INRIA ; CNRS ; Univ. Grenoble Alpes in

Higher order complexity Hugo Fre Mathieu Hoyrup CCA 2013 Hugo Fre Higher order

Formal Analysis of the Entropy / Security Trade-off in First-Order Masking Countermeasures

York University www.cs.york.ac.uk/~ndm First order vs Higher order Higher order:

Disclosures Masking and Breast Density Volpare Health Solutions (Wellington, New Zealand) Can we

Being a METS Startup Fast Failure; Fast Reward November 2016 Fast Failure; Fast Reward

Higher order Ambisonics Higher order Ambisonics A future-proof 3D audio technique A future-proof

Higher Order Functions 1 Shell CSCE 314 TAMU Higher-order Functions A function is called

De-biasing arbitrary convex regularizers and asymptotic normality Pierre C Bellec, Rutgers

IEEE IAS Technical Books Coordinating Committee (IAS/TBCC) 2013 I&amp;CPS Lisa Perry IEEE-SA

New matrix norms for structured matrix estimation Jean-Philippe Vert Optimization and Statistical

Lattice Points in Polytopes Richard P. Stanley U. Miami &amp; M.I.T. A lattice polygon Georg

Theoretical Analysis of Adversarial Learning: A Minimax Approach Zhuozhuo Tu 1 , Jingwei Zhang 2,1

IPv6 Multicast Over TEIN Pujan Srivastava | pujan@ait.asia Asian Institute of Technology

Cool gas inside early-type galaxies galaxies Timothy A.

Asteroid color photometry with Gaia and synergies with other space missions Marco Delbo

Explore More Topics

Sambuz

Useful Links

Newsletter

Mail Us

IEEE IAS Technical Books Coordinating Committee (IAS/TBCC) 2013 I&CPS Lisa Perry IEEE-SA

Lattice Points in Polytopes Richard P. Stanley U. Miami & M.I.T. A lattice polygon Georg