Effjcient Side-Channel Protections of ARX Ciphers Bernhard Jungk 1 Richard Petri 2 Marc Stöttinger 3 1 Fraunhofer Singapore, Singapore, bernhard.jungk@fraunhofer.sg 2 Fraunhofer SIT, Germany, richard.petri@sit.fraunhofer.de 3 Continental AG, Germany, marc.stoettinger@contiental-corporation.com September 10, 2018 1 / 14
Protecting ARX Ciphers and arithmetic masking, with conversion in-between d c b a Addition algorithm in software ! Simpler: Apply Boolean masking directly to an (Cost: k ) Early work by Goubin (2001) suggested Boolean “Bricklayer Attack” on ChaCha20 Skein “Butterfmy Attack” against modular addition in side-channels, see e.g. all the harder to protect against power/EM Easily protected against timing side-channels, but 2 / 14 ◮ ARX ciphers (e.g. Threefjsh, Speck, ChaCha20) rely on modular A ddition, R otation and X OR ≪ ≪ ≪ ≪
Protecting ARX Ciphers and arithmetic masking, with conversion in-between d c b a Addition algorithm in software ! Simpler: Apply Boolean masking directly to an (Cost: k ) Early work by Goubin (2001) suggested Boolean “Bricklayer Attack” on ChaCha20 Skein “Butterfmy Attack” against modular addition in side-channels, see e.g. all the harder to protect against power/EM 2 / 14 ◮ ARX ciphers (e.g. Threefjsh, Speck, ChaCha20) rely on modular A ddition, R otation and X OR ◮ Easily protected against timing side-channels, but ≪ ≪ ≪ ≪
Protecting ARX Ciphers and arithmetic masking, with conversion in-between d c b a Addition algorithm in software ! Simpler: Apply Boolean masking directly to an (Cost: k ) Early work by Goubin (2001) suggested Boolean Skein side-channels, see e.g. all the harder to protect against power/EM 2 / 14 ◮ ARX ciphers (e.g. Threefjsh, Speck, ChaCha20) rely on modular A ddition, R otation and X OR ◮ Easily protected against timing side-channels, but ≪ ◮ “Butterfmy Attack” against modular addition in ≪ ◮ “Bricklayer Attack” on ChaCha20 ≪ ≪
Protecting ARX Ciphers and arithmetic masking, with conversion in-between d c b a Addition algorithm in software ! Simpler: Apply Boolean masking directly to an (Cost: k ) Skein side-channels, see e.g. all the harder to protect against power/EM 2 / 14 ◮ ARX ciphers (e.g. Threefjsh, Speck, ChaCha20) rely on modular A ddition, R otation and X OR ◮ Easily protected against timing side-channels, but ≪ ◮ “Butterfmy Attack” against modular addition in ≪ ◮ “Bricklayer Attack” on ChaCha20 ◮ Early work by Goubin (2001) suggested Boolean ≪ ≪
Protecting ARX Ciphers and arithmetic masking, with conversion in-between d c b a Addition algorithm in software ! Simpler: Apply Boolean masking directly to an 2 / 14 Skein side-channels, see e.g. all the harder to protect against power/EM ◮ ARX ciphers (e.g. Threefjsh, Speck, ChaCha20) rely on modular A ddition, R otation and X OR ◮ Easily protected against timing side-channels, but ≪ ◮ “Butterfmy Attack” against modular addition in ≪ ◮ “Bricklayer Attack” on ChaCha20 ◮ Early work by Goubin (2001) suggested Boolean ≪ (Cost: O ( k ) ) ≪
Protecting ARX Ciphers and arithmetic masking, with conversion in-between d c b a Addition algorithm in software ! 2 / 14 side-channels, see e.g. Skein all the harder to protect against power/EM ◮ ARX ciphers (e.g. Threefjsh, Speck, ChaCha20) rely on modular A ddition, R otation and X OR ◮ Easily protected against timing side-channels, but ≪ ◮ “Butterfmy Attack” against modular addition in ≪ ◮ “Bricklayer Attack” on ChaCha20 ◮ Early work by Goubin (2001) suggested Boolean ≪ (Cost: O ( k ) ) ◮ Simpler: Apply Boolean masking directly to an ≪
Our contribution implementations until recent developments reduced the number of necessary shares We introduce some optimizations for masking additions Introduce masked versions of combined SHIFT-AND(-XOR) gates Include the “fmexible second operand” of ARM platform, performing z x y c in one instruction Reduce the number of necessary remasking steps, reducing amount of required entropy Not in this presentation: We introduce a simpler algorithm for modular subtraction 3 / 14 ◮ Threshold Implementations (TI) initially only of interest for hardware
Our contribution implementations until recent developments reduced the number of necessary shares Introduce masked versions of combined SHIFT-AND(-XOR) gates Include the “fmexible second operand” of ARM platform, performing z x y c in one instruction Reduce the number of necessary remasking steps, reducing amount of required entropy Not in this presentation: We introduce a simpler algorithm for modular subtraction 3 / 14 ◮ Threshold Implementations (TI) initially only of interest for hardware ◮ We introduce some optimizations for masking additions
Our contribution implementations until recent developments reduced the number of necessary shares Include the “fmexible second operand” of ARM platform, performing z x y c in one instruction Reduce the number of necessary remasking steps, reducing amount of required entropy Not in this presentation: We introduce a simpler algorithm for modular subtraction 3 / 14 ◮ Threshold Implementations (TI) initially only of interest for hardware ◮ We introduce some optimizations for masking additions ◮ Introduce masked versions of combined SHIFT-AND(-XOR) gates
Our contribution implementations until recent developments reduced the number of necessary shares in one instruction Reduce the number of necessary remasking steps, reducing amount of required entropy Not in this presentation: We introduce a simpler algorithm for modular subtraction 3 / 14 ◮ Threshold Implementations (TI) initially only of interest for hardware ◮ We introduce some optimizations for masking additions ◮ Introduce masked versions of combined SHIFT-AND(-XOR) gates ◮ Include the “fmexible second operand” of ARM platform, performing z ← x ( y ≪ c )
Our contribution implementations until recent developments reduced the number of necessary shares in one instruction entropy Not in this presentation: We introduce a simpler algorithm for modular subtraction 3 / 14 ◮ Threshold Implementations (TI) initially only of interest for hardware ◮ We introduce some optimizations for masking additions ◮ Introduce masked versions of combined SHIFT-AND(-XOR) gates ◮ Include the “fmexible second operand” of ARM platform, performing z ← x ( y ≪ c ) ◮ Reduce the number of necessary remasking steps, reducing amount of required
Our contribution implementations until recent developments reduced the number of necessary shares in one instruction entropy 3 / 14 ◮ Threshold Implementations (TI) initially only of interest for hardware ◮ We introduce some optimizations for masking additions ◮ Introduce masked versions of combined SHIFT-AND(-XOR) gates ◮ Include the “fmexible second operand” of ARM platform, performing z ← x ( y ≪ c ) ◮ Reduce the number of necessary remasking steps, reducing amount of required ◮ Not in this presentation: We introduce a simpler algorithm for modular subtraction
Kogge-Stone Adder (KSA) Bit 5 Combined SHIFT-AND(-XOR) gates Output Iteration 3 Iteration 2 Iteration 1 Input Bit 6 Bit 7 Bit 4 Bit 2 Bit 0 Bit 1 4 / 14 Bit 3 ( x [ b ] , y [ b ]) ( x [ 7 ] , y [ 7 ]) ( x [ 6 ] , y [ 6 ]) ( x [ 5 ] , y [ 5 ]) ( x [ 4 ] , y [ 4 ]) ( x [ 3 ] , y [ 3 ]) ( x [ 2 ] , y [ 2 ]) ( x [ 1 ] , y [ 1 ]) ( x [ 0 ] , y [ 0 ]) g [ b ] ← x [ b ] ⊕ y [ b ] p [ b ] ← x [ b ] ∧ y [ b ] ( g [ b ] , p [ b ]) ( g [ b ] , g [ b ]) ( g [ b − 2 i ] , y [ b − 2 i ]) g [ b ] ← ( p [ b ] ∧ g [ b − 2 i ]) ⊕ g [ b ] p [ b ] ← ( p [ b ] ∧ p [ b − 2 i ]) ( g [ b ] , p [ b ])
Kogge-Stone Adder (KSA) Bit 5 Combined SHIFT-AND(-XOR) gates Output Iteration 3 Iteration 2 Iteration 1 Input Bit 6 Bit 7 Bit 4 Bit 2 Bit 0 Bit 1 4 / 14 Bit 3 ( x [ b ] , y [ b ]) ( x [ 7 ] , y [ 7 ]) ( x [ 6 ] , y [ 6 ]) ( x [ 5 ] , y [ 5 ]) ( x [ 4 ] , y [ 4 ]) ( x [ 3 ] , y [ 3 ]) ( x [ 2 ] , y [ 2 ]) ( x [ 1 ] , y [ 1 ]) ( x [ 0 ] , y [ 0 ]) g [ b ] ← x [ b ] ⊕ y [ b ] p [ b ] ← x [ b ] ∧ y [ b ] ( g [ b ] , p [ b ]) ( g [ b ] , g [ b ]) ( g [ b − 2 i ] , y [ b − 2 i ]) g [ b ] ← ( p [ b ] ∧ g [ b − 2 i ]) ⊕ g [ b ] p [ b ] ← ( p [ b ] ∧ p [ b − 2 i ]) ( g [ b ] , p [ b ])
TI AND(-XOR) Gate with 2 shares registered and recombined u no guard share is required y x In the case of z input shares as guard share (just need one fresh bit) use one uniform Typical software implementation processes k -shares in parallel Output is not uniform, requiring remasking with a guard share m 5 / 14 k 1 u 1 x 0 m ( z 0 ⊕ z 1 ) ← ( x 0 ⊕ x 1 ) ∧ ( y 0 ⊕ y 1 ) s 0 ← x 0 ∧ y 0 , s 1 ← x 0 ∧ y 1 s 2 ← x 1 ∧ y 0 , s 3 ← x 1 ∧ y 1 z 0 ← s 0 ⊕ s 2 , z 1 ← s 1 ⊕ s 3 ◮ Direct approach to constructing an AND gate with four output shares, which are
TI AND(-XOR) Gate with 2 shares registered and recombined u no guard share is required y x In the case of z input shares as guard share (just need one fresh bit) use one uniform Typical software implementation processes k -shares in parallel 5 / 14 u m x 0 1 1 k ( z 0 ⊕ z 1 ) ← ( x 0 ⊕ x 1 ) ∧ ( y 0 ⊕ y 1 ) s 0 ← x 0 ∧ y 0 , s 1 ← x 0 ∧ y 1 s 2 ← x 1 ∧ y 0 , s 3 ← x 1 ∧ y 1 t 0 ← s 0 ⊕ m , t 1 ← s 1 ⊕ m z 0 ← t 0 ⊕ s 2 , z 1 ← t 1 ⊕ s 3 ◮ Direct approach to constructing an AND gate with four output shares, which are ◮ Output is not uniform, requiring remasking with a guard share m
Recommend
More recommend