Introduction to Side-Channel Analysis François-Xavier Standaert UCL Crypto Group, Belgium Summer school on real-world crypto, 2016
Outline • Link with linear cryptanalysis • Standard Differential Power Analysis • Noise-based security (is not enough) • CPA vs Gaussian templates • Post-processing the traces • Noise amplification (aka masking) • Conclusions & advanced topics
Outline • Link with linear cryptanalysis • Standard Differential Power Analysis • Noise-based security (is not enough) • CPA vs Gaussian templates • Post-processing the traces • Noise amplification (aka masking) • Conclusions & advanced topics
Linear cryptanalysis (I) 1
Linear cryptanalysis (I) 1
Linear cryptanalysis (I) 1
Linear cryptanalysis (II) 2 • Main characteristics • Divide-and-conquer attack 1 • Data complexity ∝ 𝜁 2 • 𝜁 = 2 𝑜−1 ∙ 𝑡=1 𝑜 𝜁 𝑡 ( 𝑜 S-boxes in A, bias 𝜁 𝑡 ) • Time complexity ≈ # of active S-boxes in R1
Linear cryptanalysis (II) 2 • Main characteristics • Divide-and-conquer attack 1 • Data complexity ∝ 𝜁 2 • 𝜁 = 2 𝑜−1 ∙ 𝑡=1 𝑜 𝜁 𝑡 ( 𝑜 S-boxes in A, bias 𝜁 𝑡 ) • Time complexity ≈ # of active S-boxes in R1 • Countermeasures • Data: good (non-linear) S-boxes • Data & time: Many active S-boxes • Data: Larger number of rounds
Linear cryptanalysis (II) 2 • Main characteristics • Divide-and-conquer attack 1 • Data complexity ∝ 𝜁 2 • 𝜁 = 2 𝑜−1 ∙ 𝑡=1 𝑜 𝜁 𝑡 ( 𝑜 S-boxes in A, bias 𝜁 𝑡 ) • Time complexity ≈ # of active S-boxes in R1 • Countermeasures • Data: good (non-linear) S-boxes • Data & time: Many active S-boxes • Data: Larger number of rounds AES: 𝜁 < 2 −64 after a few of rounds
Side-channel cryptanalysis 3
Differential Side-Channel Analysis 4 • Main characteristics • Divide-and-conquer attack 1 • Data complexity ∝ MI(𝐿;𝑀,𝑌) • Time complexity ∝ # of S-boxes predicted
Differential Side-Channel Analysis 4 • Main characteristics • Divide-and-conquer attack 1 • Data complexity ∝ MI(𝐿;𝑀,𝑌) • Time complexity ∝ # of S-boxes predicted • Linear cryptanalysis countermeasures • Good (non-linear) S-boxes
Differential Side-Channel Analysis 4 • Main characteristics • Divide-and-conquer attack 1 • Data complexity ∝ MI(𝐿;𝑀,𝑌) • Time complexity ∝ # of S-boxes predicted • Linear cryptanalysis countermeasures • Good (non-linear) S-boxes • Many active S-boxes
Differential Side-Channel Analysis 4 • Main characteristics • Divide-and-conquer attack 1 • Data complexity ∝ MI(𝐿;𝑀,𝑌) • Time complexity ∝ # of S-boxes predicted • Linear cryptanalysis countermeasures • Good (non-linear) S-boxes • Many active S-boxes ? • Larger number of rounds
Differential Side-Channel Analysis 4 • Main characteristics • Divide-and-conquer attack 1 • Data complexity ∝ MI(𝐿;𝑀,𝑌) • Time complexity ∝ # of S-boxes predicted • Linear cryptanalysis countermeasures • Good (non-linear) S-boxes • Many active S-boxes ? • Larger number of rounds
Differential Side-Channel Analysis 4 • Main characteristics • Divide-and-conquer attack 1 • Data complexity ∝ MI(𝐿;𝑀,𝑌) • Time complexity ∝ # of S-boxes predicted • Linear cryptanalysis countermeasures • Good (non-linear) S-boxes • Many active S-boxes ? • Larger number of rounds Unprotected implem: MI 𝐿; 𝑀, 𝑌 > 0.01
Outline • Link with linear cryptanalysis • Standard Differential Power Analysis • Noise-based security (is not enough) • CPA vs Gaussian templates • Post-processing the traces • Noise amplification (aka masking) • Conclusions & advanced topics
Standard DPA 5
Standard DPA 5
Standard DPA 5
Standard DPA 5
Standard DPA 5
Measurement & pre-processing 6 • Noise reduction via good setups (!) • Filtering, averaging (FFT, SSA, …) • Detection of Points-Of-Interest (POI) • Dimensionality reduction (PCA, LDA,…) • …
Prediction and modeling 7 • General case: profiled DPA • Build “ templates ”, i.e. 𝑔 𝑚 𝑗 𝑙, 𝑦 𝑗 • e.g. Gaussian, regression-based • Which directly leads to Pr[𝑙|𝑚 𝑗 , 𝑦 𝑗 ]
Prediction and modeling 7 • General case: profiled DPA • Build “ templates ”, i.e. 𝑔 𝑚 𝑗 𝑙, 𝑦 𝑗 • e.g. Gaussian, regression-based • Which directly leads to Pr[𝑙|𝑚 𝑗 , 𝑦 𝑗 ] • “Simplified” case: non -profiled DPA • Just assumes some model 𝑙 ∗ = HW(𝑨 𝑗 ) • e.g. 𝑛 𝑗
Prediction and modeling 7 • General case: profiled DPA • Build “ templates ”, i.e. 𝑔 𝑚 𝑗 𝑙, 𝑦 𝑗 • e.g. Gaussian, regression-based • Which directly leads to Pr[𝑙|𝑚 𝑗 , 𝑦 𝑗 ] • “Simplified” case: non -profiled DPA • Just assumes some model 𝑙 ∗ = HW(𝑨 𝑗 ) • e.g. 𝑛 𝑗 • Separation: only profiled DPA is guaranteed to succeed against any leaking device (!)
Exploitation 8 • Profiled case: maximum likelihood
Exploitation 8 • Profiled case: maximum likelihood • Unprofiled case: • Difference-of-Means • Correlation (CPA) • « On-the-fly » regression • Mutual Information Analysis (MIA) • […]
Illustration 9 Gaussian templates CPA 𝑙 = argmax E 𝑀 ∙ 𝑁 𝑙 ∗ − E 𝑀 ∙ E(𝑁 𝑙 ∗ ) 𝑟 2 𝑙 ∗ 1 − 1 𝑚 𝑗 − 𝑛 𝑗 𝑙 = argmax ∙ exp 2 ∙ 𝜏(𝑀) ∙ 𝜏(𝑁 𝑙 ∗ ) 𝜏(𝑀) 2 ∙ 𝜌 ∙ 𝜏(𝑀) k* k* 𝑗=1 • More efficient (why?) • Less efficient (why?) • Outputs probabilities • Outputs scores
Outline • Link with linear cryptanalysis • Standard Differential Power Analysis • Noise-based security (is not enough) • CPA vs Gaussian templates • Post-processing the traces • Noise amplification (aka masking) • Conclusions & advanced topics
First-order CPA (I) 10 • Lemma 1. The mutual information between two normally distributed random variables 𝑌, 𝑍 with 2 equals: 2 , 𝜏 𝑍 means 𝜈 𝑌 , 𝜈 𝑍 and variances 𝜏 𝑌 MI 𝑌; 𝑍 = − 1 2 log 2 (1 − 𝜍 𝑌, 𝑍 2 )
First-order CPA (I) 10 • Lemma 1. The mutual information between two normally distributed random variables 𝑌, 𝑍 with 2 equals: 2 , 𝜏 𝑍 means 𝜈 𝑌 , 𝜈 𝑍 and variances 𝜏 𝑌 MI 𝑌; 𝑍 = − 1 2 log 2 (1 − 𝜍 𝑌, 𝑍 2 ) • Lemma 2. In a CPA, the number of samples required to distinguish the corrrect key with model 𝑁 𝑙 from the other key candidates with 𝑑 models 𝑁 𝑙∗ is ∝ 𝜍(𝑁 𝑙 ,𝑀) 2 ( with c a small constant depending on the SR & # of key candidates )
First-order CPA (II) 11 • Lemma 3. Let 𝑌, 𝑍 and 𝑀 be three random variables s.t. 𝑍 = 𝑌 + 𝑂 1 and 𝑀 = 𝑍 + 𝑂 2 with 𝑂 1 and 𝑂 2 two additive noise variables. Then: 𝜍 𝑌, 𝑀 = 𝜍(𝑌, 𝑍) ∙ 𝜍(𝑍, 𝑀)
First-order CPA (II) 11 • Lemma 3. Let 𝑌, 𝑍 and 𝑀 be three random variables s.t. 𝑍 = 𝑌 + 𝑂 1 and 𝑀 = 𝑍 + 𝑂 2 with 𝑂 1 and 𝑂 2 two additive noise variables. Then: 𝜍 𝑌, 𝑀 = 𝜍(𝑌, 𝑍) ∙ 𝜍(𝑍, 𝑀) • Lemma 4. The correlation coefficient between the sum of 𝑜 independent and identically distributed random variables and the sum of the first 𝑛 < 𝑜 of these equals 𝑛/𝑜
Paper & pencil estimations (I) 12 • FPGA implementation of the AES • Adversary targeting the 1st byte of key • Hamming weight leakage function/model • 8-bit loop architecture broken in 10 traces
Paper & pencil estimations (I) 12 • FPGA implementation of the AES • Adversary targeting the 1st byte of key • Hamming weight leakage function/model • 8-bit loop architecture broken in 10 traces • How does the attack data complexity scale • For a 32-bit architecture? • i.e. with 24 bits of « algorithmic noise » • For a 128-bit architecture? • i.e. with 120 bits of « algorithmic noise »
Paper & pencil estimations (II) 13 • Hint: 𝑀 = M + N = 𝑁 𝑄 + 𝑁 𝑉 + 𝑂
Paper & pencil estimations (II) 13 • Hint: 𝑀 = M + N = 𝑁 𝑄 + 𝑁 𝑉 + 𝑂 • Lemma 3: 𝜍 𝑁 𝑄 , 𝑀 =
Paper & pencil estimations (II) 13 • Hint: 𝑀 = M + N = 𝑁 𝑄 + 𝑁 𝑉 + 𝑂 • Lemma 3: 𝜍 𝑁 𝑄 , 𝑀 = 𝜍(𝑁 𝑄 , 𝑁) ∙ 𝜍(𝑁, 𝑀) • Lemma 4: 𝜍 𝑁 𝑄 , 𝑁 = ? • For the 8-bit architecture: 8/8 • For the 32-bit architecture: 8/32 • For the 128-bit architecture: 8/128
Paper & pencil estimations (II) 13 • Hint: 𝑀 = M + N = 𝑁 𝑄 + 𝑁 𝑉 + 𝑂 • Lemma 3: 𝜍 𝑁 𝑄 , 𝑀 = 𝜍(𝑁 𝑄 , 𝑁) ∙ 𝜍(𝑁, 𝑀) • Lemma 4: 𝜍 𝑁 𝑄 , 𝑁 = ? • For the 8-bit architecture: 8/8 • For the 32-bit architecture: 8/32 • For the 128-bit architecture: 8/128
Paper & pencil estimations (II) 13 • Hint: 𝑀 = M + N = 𝑁 𝑄 + 𝑁 𝑉 + 𝑂 • Lemma 3: 𝜍 𝑁 𝑄 , 𝑀 = 𝜍(𝑁 𝑄 , 𝑁) ∙ 𝜍(𝑁, 𝑀) • Lemma 4: 𝜍 𝑁 𝑄 , 𝑁 = ? • For the 8-bit architecture: 8/8 • For the 32-bit architecture: 8/32 • For the 128-bit architecture: 8/128
Recommend
More recommend