side channel fault attacks
play

Side-Channel & Fault Attacks Ruggero Susella System Research - PowerPoint PPT Presentation

Side-Channel & Fault Attacks Ruggero Susella System Research & Applications Security Rodmap STMicroelectronics 2018/12/06 2 ST Who are we ? STMicroelectronics 3 A global semiconductor leader 2017 revenues of $8.35B


  1. Consumption Model 34 • Instantaneous power consumption in digital CMOS devices: • P(t) = P const (t) + P instr (t) + P data (t) + P noise (t) • P const (t) is unimportant for DPA • P instr (t) is fixed by the particular instruction executed • P data (t) is due to the currently processed data • P noise (t) has to be minimized • DPA exploits the difference of P(t) due to the P data (t) • The basic idea is to associate the device power consumption with the values processed

  2. Hamming Weight Model 35 • Try to estimate P data (t) • Based on the fact that a bit set to 1 consumes more than a bit set to 0 • Very simple model • Yet still in use today • Sometimes the Hamming Distance Model is preferable • It measure the transitions of a signal or register • Transitions are bit changing their values

  3. Sensitive Variable 36 • A DPA attack works if a relation exists between the power consumption and a target “sensitive variable” • A sensitive variable is a value: • Actually computed during the execution • Made by a combination of: • A portion of the key (i.e. 1 bit, 1 byte) • A value known to the attacker and that changes every execution (i.e. the input)

  4. DPA: (1/3) 37 • Collect the side channel of the execution of the algorithm providing different inputs • Input 0  Trace 0 = = • Input 1  Trace 1 = = • Input n  Trace n = = • Identify a sensitive variable in the algorithm • E.g. SV = Input[0] XOR Key[0] • Our target will be Key[0] • For all Input 0…n , and for all possible m values of Key[0] compute • HW(Input i [0] XOR j). Create a table of guesses: HW(Input 0 [0] XOR …) HW(Input 0 [0] XOR 0) HW(Input 0 [0] XOR 1) HW(Input 0 [0] XOR m) Input HW(Input 1 [0] XOR …) HW(Input 1 [0] XOR 0) HW(Input 1 [0] XOR 1) HW(Input 1 [0] XOR m) HW(Input … [0] XOR …) HW(Input … [0] XOR 0) HW(Input … [0] XOR 1) HW(Input … [0] XOR m) HW(Input n [0] XOR …) HW(Input n [0] XOR 0) HW(Input n [0] XOR 1) HW(Input n [0] XOR m) Key Guess

  5. DPA: Basic Idea (2/3) 38 • Create a matrix with the traces n Time/Samples per trace • For each column (time sample) compute the correlation coefficient with every column in the guess table Corr Key Guess Time/Samples per trace

  6. DPA: Basic Idea (3/3) 39 • Result is a matrix of correlation traces (1 per each key guess) Key Guess Time/Samples per trace • In (m-1) correlation traces we correlated side channel traces with intermediate variables which are never computed • Because the key is wrong • So it’s like correlating with a random vector • Expected correlation is close to zero • But in 1 correlation traces we correlated side channel traces with intermediate variables that are actually computed • At some point in time, when our sensitive variable is computed, we expect a peak towards 1

  7. Workbench for Power Analysis

  8. SPEAr board 41 New Resistance R in series to SoC Power Supply GPIO used for trigger

  9. Oscilloscope 42 • Agilent Infiniium • Features : • max 40 Gsa/s • max 2M samples • 4 channels • Differential probe • Voltage difference measurement on a resistor • Simple probe • Trigger detection 42

  10. Workbench 43 PC Linux • Commands the board • Cross-compiles for ARM Oscilloscope • Waits for trigger • Averages out the trace • Saves the trace SPEAr board • Runs crypto algorithm • Generates trigger

  11. Single Power Trace 44

  12. Mean of 1000 Power Traces 45

  13. Workbench for EM Analysis 46 • Digital scope : lecroy wavepro 40 GS/s 6Ghz bandwidth • XY stage (resolution up to 0.1µm) • Wideband amplifier (Miteq +Femto) • EM probes (langer +handmade)

  14. Timing Attacks

  15. What is a Timing Attack 48 • A side channel attack in which the attacker attempts to compromise a cryptosystem by analyzing the time taken to execute cryptographic algorithms • In some cases, exploitable from remote locations • Effective if computational timings depends on secret • Need to have encryption timings with high accuracy • Noise and sensitivity must be lower than the timing difference we want to measure

  16. Vulnerability comes from… 49 • Sometimes is a matter of algorithm • Often, algorithms leaks information through timings difference because computational steps depend on data values • Choose a constant-time algorithm to avoid these attacks • E.g. Modular exponentiation (we will see it later) can be done with Square&Multiply algorithm (variable-time) or with Square&Multiply Always (constant-time) • Otherwise, can be a matter of implementation • Cache-Timing Attack takes advantage of data-dependent timing variations during accesses into the cache (greater computational time for cache miss) • It exploits implementations in which secret data is used as an array index (e.g. AES Sbox) • Almost every implementation can be made constant-time in order to avoid these attacks

  17. Timing attack chart example 50

  18. Agenda 51 • Side Channel Attacks • Introduction • Symmetric Key Cryptography: • Introduction • AES • Side Channel Attacks on AES • Fault Attacks • Fault Attacks on AES

  19. Symmetric Key Algorithms

  20. Data Encryption 53 • Scrambling of data with an algorithm and a secret key • Decryption requires having the same secret key • The encryption algorithm is not required to be secret • In fact, Kerckhoffs’s principle states that: • Security must fully rely only on the secrecy of the key • Violating this principle is called: security by obscurity • Knowledge of plaintext ciphertext pairs should be useless for the attacker • Some information leaks independently of encryption: • Number of messages exchanged • Length of messages

  21. Symmetric Key Cryptography 54 Decryption Encryption Encryption key is also used for decryption It must be kept secret !

  22. AES

  23. AES Standardization 56 • The Advanced Encryption Standard (AES) is the result of a competition about symmetric algorithm, which has been requested by NIST for replacing the DES. • After a 4 year competition run by NIST, among 15 candidates, an algorithm has been selected, named Rijndael, designed by two Belgian cryptographer Vincent Rijmen and Joan Daemen

  24. AES Overview 57 • Substitution-permutation network block cipher • Iterates several time a “round” • A round is made by a series of round operations • Decryption is done by doing, in reverse order, the inverted round operations • 128 bit of state (viewed as 4 x 4 byte matrix) • Key sizes of 128, 192, 256 bit • With respectively 10, 12, 14 number of rounds • Each round uses a different round key generated by a key schedule procedure • Round keys are always 128 bit

  25. AES Block Cipher 58 128 bits 128 or 192 or 256 bits 128 bits 58

  26. AES Input Mapping 59 • Input is a block of 128 bits which gets mapped into a 4x4 byte matrix 00 04 08 12 01 05 09 13 Plaintext = 0x00010203040506070809101112131415 02 06 10 14 03 07 11 15

  27. AES Algorithm PLAINTEXT AddRoundKey KEY SubBytes Key Round Key Schedule is a ShiftRows Schedule separate part of the AES algorithms which, MixColumns given a key (128,192,256 bit) AddRoundKey generates (10,12,14) 128 bit round keys. Last Round SubBytes Each round key is used in a different round ShiftRows AddRoundKey CIPHERTEXT

  28. AES SubBytes 61 • Byte by Byte Substitution (Permutation) • Highly non-linear • Most often implemented as look up table • Invertible, by using another look up table

  29. AES ShiftRows 62 • Simply rotate rows • The inverted operation rotates rows in the opposite way • Provides diffusion by mixing contributions of different columns

  30. AES MixColumns 63 • Every output byte depends on all 4 input bytes • Provides diffusion • Linear and invertible transformation

  31. AES AddRoundKey 64 AddRoundKey is a XOR between the 128 bit state and the 128 bit round key

  32. Implementations 65 • SW • Key Schedule computed in advance and all round keys stored in RAM • Trade-Off between size and speed • Only SubBytes LUT, no LUT for MixColumns (256B + 256B) • LUT SubBytes + MixColumns (1024B + 1024B) • LUT SubBytes + ShiftRows + MixColumns (4096B + 4096B) • And dedicated CPU instructions • Intel’s AES -NI • ARM Neon Crypto Extension (ARMv8-A) • HW • Key Schedule computed on the fly in parallel to AES round • AES round can have 8, 32 or 128 bit DataPath • Requires 1 SubBytes , 4 SubBytes or 16 SubBytes • Sbox can be a LUT or combinatorial (with different options)

  33. 66 Power Analysis on AES

  34. DPA on AES (1/3) 67 • We need to identify our sensitive variable • We need a value based on a part of the key and something we know • What we know ? PLAINTEXT • Only plaintexts and/or ciphertexts AddRoundKey KEY • We can focus on first round Sbox • Which is Sbox(Plaintext XOR Key) SubBytes • Sbox(P[0] XOR Key[0]) depends on the plaintext and a single byte of the Key • We only need 2 8 = 256 hypothesis

  35. DPA on AES: (1/3) 68 • Collect the side channel of the execution of the algorithm providing different Plaintexts P • P 0  Trace 0 = = • P 1  Trace 1 = = • P n  Trace n = = • Identify a sensitive variable in the algorithm: P[0] xor Key[0] • For all P 0…n , and for all possible m values of Key[0] (=0..256) compute • HW(P i [0] XOR j). Create a table of guesses: HW(P 0 [0] XOR …) HW(P 0 [0] XOR 0) HW(P 0 [0] XOR 1) HW(P 0 [0] XOR m) Input HW(P 1 [0] XOR …) HW(P 1 [0] XOR 0) HW(P 1 [0] XOR 1) HW(P 1 [0] XOR m) HW(P … [0] XOR …) HW(P … [0] XOR 0) HW(P … [0] XOR 1) HW(P … [0] XOR m) HW(P n [0] XOR …) HW(P n [0] XOR 0) HW(P n [0] XOR 1) HW(P n [0] XOR m) Key Guess

  36. DPA: Basic Idea (2/3) 69 • Create a matrix with the traces n Time/Samples per trace • For each column (time sample) compute the correlation coefficient with every column in the guess table Corr Key Guess Time/Samples per trace

  37. DPA: Basic Idea (3/3) 70 • Result is a matrix of correlation traces (1 per each key guess) Key Guess Time/Samples per trace • In (m-1) correlation traces we correlated side channel traces with intermediate variables which are never computed • Because the key is wrong • So it’s like correlating with a random vector • Expected correlation is close to zero • But in 1 correlation traces we correlated side channel traces with intermediate variables that are actually computed • At some point in time, when our sensitive variable is computed, we expect a peak towards 1

  38. First Round Attack (1/2) 71

  39. First Round Attack (2/2) 72

  40. Countermeasures 73 • Dual Rail Logic • Introduces different implementation of logic gates • Goal is to have a power consumption independent of the data • Drawbacks: complex, ad-hoc EDA tools, size, glitches • Execution Time Randomization • Introduces random delays in the computation • Goal is to mess with the trace synchronization required by DPA • Drawbacks: random generation, slow, can be resynchronized • Data Randomization (Masking) • The input (plaintext) is randomly masked at each execution • Goal is to have SV depending of unknown random • Drawbacks: random generation, slow, second order attacks

  41. Agenda 74 • Side Channel Attacks • Introduction • Symmetric Key Cryptography: • Introduction • AES • Side Channel Attacks on AES • Fault Attacks • Fault Attacks on AES

  42. Fault Attacks

  43. Accidental Faults 76 • Electronic devices are subject to (usually) rare faults • Caused by environment • Unexpected temperature, ionizing particles, power grid glitches, electrostatic discharges… 50s 60s 70s 80s 90s 00s 10s 20s Ground Nuclear Testing Aerospace Industry Super Computers Critical systems Smaller systems Anomalies in electronic Problems in space Errors appear in Problems in cars, Half of embedded monitoring equipment electronics large memories health, voting devices designs safety relevant Random bit flips in memory Random errors in logic as transistor size decreases

  44. From Accidental to Intentional Faults 77 • Attacker idea : provoke & control fault to perturb device at the right time Skip check Bad result • And exploit the fault to break security ! • Bypass secure boot, secure firmware upgrade checks • Change device state, get cryptographic algorithms keys, … • Usually HW is trusted, SW does not expect it to fail Is PIN no • Can bypass SW protections this way yes OK? • Often only way to attack bug-free SW • Brief History Increment Continue Counter • Late 1990s : unlock pay TV smart cards • 2000s : bypass game protection on console Error • Late 2000s : protection mandatory for set-top-boxes • Late 2010s : more on more public attacks on IoT devices • Labs trained on smart cards looking for new targets

  45. Faults Exploitation 78 • Fault Model • Registers, Logic, Flash, RAM… • Single bit, few bits, word.. • Stuck at 0 or 1, flip, random • Precise/loose/random control on location & timing • Transient, permanent, destructive • Multiple faults • Instruction skip, force jump… • Target • Stored Data • Computations • Crypto • Program Flow Source https://wp-systeme.lip6.fr/jaif/wp-content/uploads/sites/8/2018/05/KH-29-05-2018-JAIF.pdf

  46. How to Inject Faults ? 79 • Non-invasive methods Temperature Voltage Undersupply • No physical damage to chip Clock glitch • Modify working conditions Voltage glitch • Moderate knowledge/equipment Electromagnetic Pulses • Semi-invasive methods • Chip de-capsulation Laser • Milling, etching, cleaning • Affordable equipment • Often requires building custom boards • Invasive methods • Establish electrical contact to chip (FIB) • Modification, destruction, … • Expensive equipment, e.g semiconductor diagnostics source: https://www.cosic.esat.kuleuven.be/summer_school_sardinia_2015/slides/Balasch.pdf

  47. Temperature & Particles 80 • Temperature • Heating causes combinatorial logic to slow down • Data not yet ready when sampled • Maybe used to increase sensibility to other injections methods • Particles “toy” example • Smoke detector used to perturb Smart Cards • Getting harder for particles to go through package • Both are not precise at all, and never used in practice

  48. Voltage Undersupply 81 • Low voltage causes combinatorial logic to slow down • Data not yet ready when sampled ! • Not very precise in time & space (location) • Can be used to get out of infinite loops for instance • Used to unlock Pay TV Smart Cards in 1990s source: https://www.cosic.esat.kuleuven.be/summer_school_sardinia_2015/slides/Balasch.pdf

  49. Clock Glitch 82 • Requires simple signal generator • Attack precise clock cycle of targeted instruction Clock • Like if instruction had less time to complete • Data not ready when latched CLOCK ins N-2 ins N-1 ins N ins N+1 ins N+2 • Affects everything synchronized by this clock • But only works if CPU runs from external clock

  50. Voltage Glitch 83 • Affects everything powered by perturbed VCC pin VCC • Attack target instruction when it is executed • Combinatorial logic slowed down by low voltage • Data not yet ready when sampled VCC ins N-2 ins N-1 ins N ins N+1 ins N+2 • Must explore to find right glitch parameters • Width, depth, time • Board and chip capacitors may filter or degrade glitch • Can be deployed through mod-chips to solder on board • Usually most dangerous noninvasive fault injection method

  51. Effects 84 • Wrong data is sampled • Fault slows down combinatorial logic • Or provokes early latch • => Result sampled before it’s ready • Critical path violation • Global impact (whole chip) • Time may be finely adjusted • Perturb logic when it’s used

  52. Electromagnetic Pulses 85 • Shot location on chip (not very precise) • Internal clock & power line • Random Number Generator • Specific security IP • Processor, memory, bus… • Probably broader fault model • Not fully understood yet • Many configurable parameters • Probe (coil area, core magnetic permeability) • Position (X,Y,Z) • Pulse amplitude and width

  53. Our Bench: Electromagnetic Fault Injection 86 • Pulse generator • 6 ns-100ns duration • 400 v(single polarity) • DSO • XYZ stages • 2.5GHZ • 40 MS • EM probe(analysis) • WB amplifier • STM32F103 • 1GHz Discovery board

  54. Laser (1/2) 87 • Shoot very precise location on chip • Down to 1 µm • Many configurable parameters • Position (X,Y) • Wavelength, Spot size • Energy / Peak power • Pulse vs Continuous • … • Space search grows exponentially • Require to know where to shoot • Or exhaustive tries on all chip surface

  55. Laser (2/2) 88 • Very localized effect • Very broad range of possible effects • Bit(s) flips/stuck in RAM, registers, logic, flash … • => Harder to protect against • But usually attack is expensive • De-capsuling chips, including thinning • Complex synchronization HW • Very often requires attacking from backside • Custom HW & boards • Few months to setup HW, SW • Target critical assets • Retrieve global secrets (global keys, sensitive FW IP…) • “Break one break all” • First used to break smart cards, then set-top boxes, micros are next ?

  56. Our Bench: Laser Fault Injection 89 • Quicklaze-50 STII (ESI) • Nd-YAG laser crystal • 3 wavelengths : • UV3(355nm) Green(532nm) IR(1064nm) • fixed pulse duration : 5ns • Mitutoyo lens: • IR : x50; Green : X20; UV : x50 • Min spotsize : 1µm x 1µm • XY stage : min step=0.1µm

  57. Few Exploitation Examples 90 • Retrieving cryptographic keys • Electromagnetic pulse on AES round number [Dehbaoui and al, COSADE 2013] • Usually attacks on crypto require access to few faulted results • Bypassing secure boot • Laser shot on Android phone TrustZone NS bit [Alphanov, FDTC 2017] • Taking over a device • Voltage glitch to control Program Counter on STM32 [Riscure FDTC 2016] • Privilege escalation • Voltage glitch to get root on Linux [Riscure, FDTC 2017] • Voltage glitch “Chip Whisperer” practice platform for students • Based on STM32, can also be used to attack STM32s with provided boards

  58. Fault Attack against AES

  59. Differential Fault Analysis 92 • The device under attack executes a cryptographic operation • It involves a secret key (target of the attack) • The comparison between correct data and faulted data may allow to derive information about the secret key • The attacker needs the output of: • Normal operation involving an input and the secret key • Faulted operation with the same input and same secret key

  60. Giraud’s Attack 93 • Goal : recover the last round key • Use the last round key to recover the cipher key of AES-128 • Fault model : random single-bit corruption at the beginning of the last round • Before SubBytes

  61. Giraud’s Attack 94 0 4 8 12 0 4 8 12 1 5 9 13 1 5 9 13 SB 2 6 10 14 2 6 10 14 3 7 11 15 3 7 11 15 𝑩 𝑪 0 4 8 12 0 4 8 12 1 5 9 13 1 5 9 13 SR ARK 2 6 10 14 2 6 10 14 3 7 11 15 3 7 11 15 𝑫 𝑬 0 4 8 12 1 5 9 13 2 6 10 14 3 7 11 15 𝑳 𝑶𝒔

  62. Giraud’s Attack 95 𝜻 0 4 8 12 0 4 8 12 1 5 9 13 1 5 9 13 SB 2 6 10 14 2 6 10 14 3 7 11 15 3 7 11 15 𝑩 𝑪 0 4 8 12 0 4 8 12 1 5 9 13 1 5 9 13 SR ARK 2 6 10 14 2 6 10 14 3 7 11 15 3 7 11 15 𝑫 𝑬 0 4 8 12 1 5 9 13 2 6 10 14 3 7 11 15 𝑳 𝑶𝒔

  63. Giraud’s Attack 96 𝜻 𝜻′ 0 4 8 12 0 4 8 12 1 5 9 13 1 5 9 13 SB 2 6 10 14 2 6 10 14 3 7 11 15 3 7 11 15 𝑩 𝑪 0 4 8 12 0 4 8 12 1 5 9 13 1 5 9 13 SR ARK 2 6 10 14 2 6 10 14 3 7 11 15 3 7 11 15 𝑫 𝑬 0 4 8 12 1 5 9 13 2 6 10 14 3 7 11 15 𝑳 𝑶𝒔

  64. Giraud’s Attack 97 𝜻 𝜻′ 0 4 8 12 0 4 8 12 1 5 9 13 1 5 9 13 SB 2 6 10 14 2 6 10 14 3 7 11 15 3 7 11 15 𝑩 𝑪 𝜻′ 0 4 8 12 0 4 8 12 1 5 9 13 1 5 9 13 SR ARK 2 6 10 14 2 6 10 14 3 7 11 15 3 7 11 15 𝑫 𝑬 0 4 8 12 1 5 9 13 2 6 10 14 3 7 11 15 𝑳 𝑶𝒔

  65. Giraud’s Attack 98 𝜻 𝜻′ 0 4 8 12 0 4 8 12 1 5 9 13 1 5 9 13 SB 2 6 10 14 2 6 10 14 3 7 11 15 3 7 11 15 𝑩 𝑪 𝜻′ 𝜻′ 0 4 8 12 0 4 8 12 1 5 9 13 1 5 9 13 SR ARK 2 6 10 14 2 6 10 14 3 7 11 15 3 7 11 15 𝑫 𝑬 0 4 8 12 1 5 9 13 2 6 10 14 3 7 11 15 𝑳 𝑶𝒔

  66. Giraud’s Attack 99 𝜻 𝜻′ 0 4 8 12 0 4 8 12 1 5 9 13 1 5 9 13 SB 2 6 10 14 2 6 10 14 3 7 11 15 3 7 11 15 𝑩 𝑪 𝜻′ 𝜻′ 0 4 8 12 0 4 8 12 1 5 9 13 1 5 9 13 SR ARK 2 6 10 14 2 6 10 14 3 7 11 15 3 7 11 15 𝑫 𝑬 0 4 8 12 1 5 9 13 2 6 10 14 3 7 11 15 𝑳 𝑶𝒔

  67. Giraud’s Attack 100 • Pre-compile the table For each 𝒘𝒃𝒎 = (0𝑦00: 0𝑦𝐺𝐺) of the byte For each fault 𝜻 = (0𝑦01,0𝑦02,0𝑦04,0𝑦08,0𝑦10,0𝑦20,0𝑦40,0𝑦80) Compute 𝜠 = 𝑇𝑣𝑐𝐶𝑧𝑢𝑓𝑡(𝑤𝑏𝑚) ⊕ 𝑇𝑣𝑐𝐶𝑧𝑢𝑓𝑡(𝑤𝑏𝑚 ⊕ 𝜁) • For each fault, looking for 𝒘𝒃𝒎 where 𝜻 ′ = 𝜠 provides 8 entries in average • 3 faults on one byte allows to identify the correct 𝒘𝒃𝒎 of the state • 𝑳𝒇𝒛 = 𝑑𝑗𝑞ℎ𝑓𝑠𝑢𝑓𝑦𝑢 ⊕ 𝑇𝑣𝑐𝐶𝑧𝑢𝑓𝑡(𝑤𝑏𝑚) • The sequence must be repeated for each byte

Recommend


More recommend