Consumption Model 34 • Instantaneous power consumption in digital CMOS devices: • P(t) = P const (t) + P instr (t) + P data (t) + P noise (t) • P const (t) is unimportant for DPA • P instr (t) is fixed by the particular instruction executed • P data (t) is due to the currently processed data • P noise (t) has to be minimized • DPA exploits the difference of P(t) due to the P data (t) • The basic idea is to associate the device power consumption with the values processed
Hamming Weight Model 35 • Try to estimate P data (t) • Based on the fact that a bit set to 1 consumes more than a bit set to 0 • Very simple model • Yet still in use today • Sometimes the Hamming Distance Model is preferable • It measure the transitions of a signal or register • Transitions are bit changing their values
Sensitive Variable 36 • A DPA attack works if a relation exists between the power consumption and a target “sensitive variable” • A sensitive variable is a value: • Actually computed during the execution • Made by a combination of: • A portion of the key (i.e. 1 bit, 1 byte) • A value known to the attacker and that changes every execution (i.e. the input)
DPA: (1/3) 37 • Collect the side channel of the execution of the algorithm providing different inputs • Input 0 Trace 0 = = • Input 1 Trace 1 = = • Input n Trace n = = • Identify a sensitive variable in the algorithm • E.g. SV = Input[0] XOR Key[0] • Our target will be Key[0] • For all Input 0…n , and for all possible m values of Key[0] compute • HW(Input i [0] XOR j). Create a table of guesses: HW(Input 0 [0] XOR …) HW(Input 0 [0] XOR 0) HW(Input 0 [0] XOR 1) HW(Input 0 [0] XOR m) Input HW(Input 1 [0] XOR …) HW(Input 1 [0] XOR 0) HW(Input 1 [0] XOR 1) HW(Input 1 [0] XOR m) HW(Input … [0] XOR …) HW(Input … [0] XOR 0) HW(Input … [0] XOR 1) HW(Input … [0] XOR m) HW(Input n [0] XOR …) HW(Input n [0] XOR 0) HW(Input n [0] XOR 1) HW(Input n [0] XOR m) Key Guess
DPA: Basic Idea (2/3) 38 • Create a matrix with the traces n Time/Samples per trace • For each column (time sample) compute the correlation coefficient with every column in the guess table Corr Key Guess Time/Samples per trace
DPA: Basic Idea (3/3) 39 • Result is a matrix of correlation traces (1 per each key guess) Key Guess Time/Samples per trace • In (m-1) correlation traces we correlated side channel traces with intermediate variables which are never computed • Because the key is wrong • So it’s like correlating with a random vector • Expected correlation is close to zero • But in 1 correlation traces we correlated side channel traces with intermediate variables that are actually computed • At some point in time, when our sensitive variable is computed, we expect a peak towards 1
Workbench for Power Analysis
SPEAr board 41 New Resistance R in series to SoC Power Supply GPIO used for trigger
Oscilloscope 42 • Agilent Infiniium • Features : • max 40 Gsa/s • max 2M samples • 4 channels • Differential probe • Voltage difference measurement on a resistor • Simple probe • Trigger detection 42
Workbench 43 PC Linux • Commands the board • Cross-compiles for ARM Oscilloscope • Waits for trigger • Averages out the trace • Saves the trace SPEAr board • Runs crypto algorithm • Generates trigger
Single Power Trace 44
Mean of 1000 Power Traces 45
Workbench for EM Analysis 46 • Digital scope : lecroy wavepro 40 GS/s 6Ghz bandwidth • XY stage (resolution up to 0.1µm) • Wideband amplifier (Miteq +Femto) • EM probes (langer +handmade)
Timing Attacks
What is a Timing Attack 48 • A side channel attack in which the attacker attempts to compromise a cryptosystem by analyzing the time taken to execute cryptographic algorithms • In some cases, exploitable from remote locations • Effective if computational timings depends on secret • Need to have encryption timings with high accuracy • Noise and sensitivity must be lower than the timing difference we want to measure
Vulnerability comes from… 49 • Sometimes is a matter of algorithm • Often, algorithms leaks information through timings difference because computational steps depend on data values • Choose a constant-time algorithm to avoid these attacks • E.g. Modular exponentiation (we will see it later) can be done with Square&Multiply algorithm (variable-time) or with Square&Multiply Always (constant-time) • Otherwise, can be a matter of implementation • Cache-Timing Attack takes advantage of data-dependent timing variations during accesses into the cache (greater computational time for cache miss) • It exploits implementations in which secret data is used as an array index (e.g. AES Sbox) • Almost every implementation can be made constant-time in order to avoid these attacks
Timing attack chart example 50
Agenda 51 • Side Channel Attacks • Introduction • Symmetric Key Cryptography: • Introduction • AES • Side Channel Attacks on AES • Fault Attacks • Fault Attacks on AES
Symmetric Key Algorithms
Data Encryption 53 • Scrambling of data with an algorithm and a secret key • Decryption requires having the same secret key • The encryption algorithm is not required to be secret • In fact, Kerckhoffs’s principle states that: • Security must fully rely only on the secrecy of the key • Violating this principle is called: security by obscurity • Knowledge of plaintext ciphertext pairs should be useless for the attacker • Some information leaks independently of encryption: • Number of messages exchanged • Length of messages
Symmetric Key Cryptography 54 Decryption Encryption Encryption key is also used for decryption It must be kept secret !
AES
AES Standardization 56 • The Advanced Encryption Standard (AES) is the result of a competition about symmetric algorithm, which has been requested by NIST for replacing the DES. • After a 4 year competition run by NIST, among 15 candidates, an algorithm has been selected, named Rijndael, designed by two Belgian cryptographer Vincent Rijmen and Joan Daemen
AES Overview 57 • Substitution-permutation network block cipher • Iterates several time a “round” • A round is made by a series of round operations • Decryption is done by doing, in reverse order, the inverted round operations • 128 bit of state (viewed as 4 x 4 byte matrix) • Key sizes of 128, 192, 256 bit • With respectively 10, 12, 14 number of rounds • Each round uses a different round key generated by a key schedule procedure • Round keys are always 128 bit
AES Block Cipher 58 128 bits 128 or 192 or 256 bits 128 bits 58
AES Input Mapping 59 • Input is a block of 128 bits which gets mapped into a 4x4 byte matrix 00 04 08 12 01 05 09 13 Plaintext = 0x00010203040506070809101112131415 02 06 10 14 03 07 11 15
AES Algorithm PLAINTEXT AddRoundKey KEY SubBytes Key Round Key Schedule is a ShiftRows Schedule separate part of the AES algorithms which, MixColumns given a key (128,192,256 bit) AddRoundKey generates (10,12,14) 128 bit round keys. Last Round SubBytes Each round key is used in a different round ShiftRows AddRoundKey CIPHERTEXT
AES SubBytes 61 • Byte by Byte Substitution (Permutation) • Highly non-linear • Most often implemented as look up table • Invertible, by using another look up table
AES ShiftRows 62 • Simply rotate rows • The inverted operation rotates rows in the opposite way • Provides diffusion by mixing contributions of different columns
AES MixColumns 63 • Every output byte depends on all 4 input bytes • Provides diffusion • Linear and invertible transformation
AES AddRoundKey 64 AddRoundKey is a XOR between the 128 bit state and the 128 bit round key
Implementations 65 • SW • Key Schedule computed in advance and all round keys stored in RAM • Trade-Off between size and speed • Only SubBytes LUT, no LUT for MixColumns (256B + 256B) • LUT SubBytes + MixColumns (1024B + 1024B) • LUT SubBytes + ShiftRows + MixColumns (4096B + 4096B) • And dedicated CPU instructions • Intel’s AES -NI • ARM Neon Crypto Extension (ARMv8-A) • HW • Key Schedule computed on the fly in parallel to AES round • AES round can have 8, 32 or 128 bit DataPath • Requires 1 SubBytes , 4 SubBytes or 16 SubBytes • Sbox can be a LUT or combinatorial (with different options)
66 Power Analysis on AES
DPA on AES (1/3) 67 • We need to identify our sensitive variable • We need a value based on a part of the key and something we know • What we know ? PLAINTEXT • Only plaintexts and/or ciphertexts AddRoundKey KEY • We can focus on first round Sbox • Which is Sbox(Plaintext XOR Key) SubBytes • Sbox(P[0] XOR Key[0]) depends on the plaintext and a single byte of the Key • We only need 2 8 = 256 hypothesis
DPA on AES: (1/3) 68 • Collect the side channel of the execution of the algorithm providing different Plaintexts P • P 0 Trace 0 = = • P 1 Trace 1 = = • P n Trace n = = • Identify a sensitive variable in the algorithm: P[0] xor Key[0] • For all P 0…n , and for all possible m values of Key[0] (=0..256) compute • HW(P i [0] XOR j). Create a table of guesses: HW(P 0 [0] XOR …) HW(P 0 [0] XOR 0) HW(P 0 [0] XOR 1) HW(P 0 [0] XOR m) Input HW(P 1 [0] XOR …) HW(P 1 [0] XOR 0) HW(P 1 [0] XOR 1) HW(P 1 [0] XOR m) HW(P … [0] XOR …) HW(P … [0] XOR 0) HW(P … [0] XOR 1) HW(P … [0] XOR m) HW(P n [0] XOR …) HW(P n [0] XOR 0) HW(P n [0] XOR 1) HW(P n [0] XOR m) Key Guess
DPA: Basic Idea (2/3) 69 • Create a matrix with the traces n Time/Samples per trace • For each column (time sample) compute the correlation coefficient with every column in the guess table Corr Key Guess Time/Samples per trace
DPA: Basic Idea (3/3) 70 • Result is a matrix of correlation traces (1 per each key guess) Key Guess Time/Samples per trace • In (m-1) correlation traces we correlated side channel traces with intermediate variables which are never computed • Because the key is wrong • So it’s like correlating with a random vector • Expected correlation is close to zero • But in 1 correlation traces we correlated side channel traces with intermediate variables that are actually computed • At some point in time, when our sensitive variable is computed, we expect a peak towards 1
First Round Attack (1/2) 71
First Round Attack (2/2) 72
Countermeasures 73 • Dual Rail Logic • Introduces different implementation of logic gates • Goal is to have a power consumption independent of the data • Drawbacks: complex, ad-hoc EDA tools, size, glitches • Execution Time Randomization • Introduces random delays in the computation • Goal is to mess with the trace synchronization required by DPA • Drawbacks: random generation, slow, can be resynchronized • Data Randomization (Masking) • The input (plaintext) is randomly masked at each execution • Goal is to have SV depending of unknown random • Drawbacks: random generation, slow, second order attacks
Agenda 74 • Side Channel Attacks • Introduction • Symmetric Key Cryptography: • Introduction • AES • Side Channel Attacks on AES • Fault Attacks • Fault Attacks on AES
Fault Attacks
Accidental Faults 76 • Electronic devices are subject to (usually) rare faults • Caused by environment • Unexpected temperature, ionizing particles, power grid glitches, electrostatic discharges… 50s 60s 70s 80s 90s 00s 10s 20s Ground Nuclear Testing Aerospace Industry Super Computers Critical systems Smaller systems Anomalies in electronic Problems in space Errors appear in Problems in cars, Half of embedded monitoring equipment electronics large memories health, voting devices designs safety relevant Random bit flips in memory Random errors in logic as transistor size decreases
From Accidental to Intentional Faults 77 • Attacker idea : provoke & control fault to perturb device at the right time Skip check Bad result • And exploit the fault to break security ! • Bypass secure boot, secure firmware upgrade checks • Change device state, get cryptographic algorithms keys, … • Usually HW is trusted, SW does not expect it to fail Is PIN no • Can bypass SW protections this way yes OK? • Often only way to attack bug-free SW • Brief History Increment Continue Counter • Late 1990s : unlock pay TV smart cards • 2000s : bypass game protection on console Error • Late 2000s : protection mandatory for set-top-boxes • Late 2010s : more on more public attacks on IoT devices • Labs trained on smart cards looking for new targets
Faults Exploitation 78 • Fault Model • Registers, Logic, Flash, RAM… • Single bit, few bits, word.. • Stuck at 0 or 1, flip, random • Precise/loose/random control on location & timing • Transient, permanent, destructive • Multiple faults • Instruction skip, force jump… • Target • Stored Data • Computations • Crypto • Program Flow Source https://wp-systeme.lip6.fr/jaif/wp-content/uploads/sites/8/2018/05/KH-29-05-2018-JAIF.pdf
How to Inject Faults ? 79 • Non-invasive methods Temperature Voltage Undersupply • No physical damage to chip Clock glitch • Modify working conditions Voltage glitch • Moderate knowledge/equipment Electromagnetic Pulses • Semi-invasive methods • Chip de-capsulation Laser • Milling, etching, cleaning • Affordable equipment • Often requires building custom boards • Invasive methods • Establish electrical contact to chip (FIB) • Modification, destruction, … • Expensive equipment, e.g semiconductor diagnostics source: https://www.cosic.esat.kuleuven.be/summer_school_sardinia_2015/slides/Balasch.pdf
Temperature & Particles 80 • Temperature • Heating causes combinatorial logic to slow down • Data not yet ready when sampled • Maybe used to increase sensibility to other injections methods • Particles “toy” example • Smoke detector used to perturb Smart Cards • Getting harder for particles to go through package • Both are not precise at all, and never used in practice
Voltage Undersupply 81 • Low voltage causes combinatorial logic to slow down • Data not yet ready when sampled ! • Not very precise in time & space (location) • Can be used to get out of infinite loops for instance • Used to unlock Pay TV Smart Cards in 1990s source: https://www.cosic.esat.kuleuven.be/summer_school_sardinia_2015/slides/Balasch.pdf
Clock Glitch 82 • Requires simple signal generator • Attack precise clock cycle of targeted instruction Clock • Like if instruction had less time to complete • Data not ready when latched CLOCK ins N-2 ins N-1 ins N ins N+1 ins N+2 • Affects everything synchronized by this clock • But only works if CPU runs from external clock
Voltage Glitch 83 • Affects everything powered by perturbed VCC pin VCC • Attack target instruction when it is executed • Combinatorial logic slowed down by low voltage • Data not yet ready when sampled VCC ins N-2 ins N-1 ins N ins N+1 ins N+2 • Must explore to find right glitch parameters • Width, depth, time • Board and chip capacitors may filter or degrade glitch • Can be deployed through mod-chips to solder on board • Usually most dangerous noninvasive fault injection method
Effects 84 • Wrong data is sampled • Fault slows down combinatorial logic • Or provokes early latch • => Result sampled before it’s ready • Critical path violation • Global impact (whole chip) • Time may be finely adjusted • Perturb logic when it’s used
Electromagnetic Pulses 85 • Shot location on chip (not very precise) • Internal clock & power line • Random Number Generator • Specific security IP • Processor, memory, bus… • Probably broader fault model • Not fully understood yet • Many configurable parameters • Probe (coil area, core magnetic permeability) • Position (X,Y,Z) • Pulse amplitude and width
Our Bench: Electromagnetic Fault Injection 86 • Pulse generator • 6 ns-100ns duration • 400 v(single polarity) • DSO • XYZ stages • 2.5GHZ • 40 MS • EM probe(analysis) • WB amplifier • STM32F103 • 1GHz Discovery board
Laser (1/2) 87 • Shoot very precise location on chip • Down to 1 µm • Many configurable parameters • Position (X,Y) • Wavelength, Spot size • Energy / Peak power • Pulse vs Continuous • … • Space search grows exponentially • Require to know where to shoot • Or exhaustive tries on all chip surface
Laser (2/2) 88 • Very localized effect • Very broad range of possible effects • Bit(s) flips/stuck in RAM, registers, logic, flash … • => Harder to protect against • But usually attack is expensive • De-capsuling chips, including thinning • Complex synchronization HW • Very often requires attacking from backside • Custom HW & boards • Few months to setup HW, SW • Target critical assets • Retrieve global secrets (global keys, sensitive FW IP…) • “Break one break all” • First used to break smart cards, then set-top boxes, micros are next ?
Our Bench: Laser Fault Injection 89 • Quicklaze-50 STII (ESI) • Nd-YAG laser crystal • 3 wavelengths : • UV3(355nm) Green(532nm) IR(1064nm) • fixed pulse duration : 5ns • Mitutoyo lens: • IR : x50; Green : X20; UV : x50 • Min spotsize : 1µm x 1µm • XY stage : min step=0.1µm
Few Exploitation Examples 90 • Retrieving cryptographic keys • Electromagnetic pulse on AES round number [Dehbaoui and al, COSADE 2013] • Usually attacks on crypto require access to few faulted results • Bypassing secure boot • Laser shot on Android phone TrustZone NS bit [Alphanov, FDTC 2017] • Taking over a device • Voltage glitch to control Program Counter on STM32 [Riscure FDTC 2016] • Privilege escalation • Voltage glitch to get root on Linux [Riscure, FDTC 2017] • Voltage glitch “Chip Whisperer” practice platform for students • Based on STM32, can also be used to attack STM32s with provided boards
Fault Attack against AES
Differential Fault Analysis 92 • The device under attack executes a cryptographic operation • It involves a secret key (target of the attack) • The comparison between correct data and faulted data may allow to derive information about the secret key • The attacker needs the output of: • Normal operation involving an input and the secret key • Faulted operation with the same input and same secret key
Giraud’s Attack 93 • Goal : recover the last round key • Use the last round key to recover the cipher key of AES-128 • Fault model : random single-bit corruption at the beginning of the last round • Before SubBytes
Giraud’s Attack 94 0 4 8 12 0 4 8 12 1 5 9 13 1 5 9 13 SB 2 6 10 14 2 6 10 14 3 7 11 15 3 7 11 15 𝑩 𝑪 0 4 8 12 0 4 8 12 1 5 9 13 1 5 9 13 SR ARK 2 6 10 14 2 6 10 14 3 7 11 15 3 7 11 15 𝑫 𝑬 0 4 8 12 1 5 9 13 2 6 10 14 3 7 11 15 𝑳 𝑶𝒔
Giraud’s Attack 95 𝜻 0 4 8 12 0 4 8 12 1 5 9 13 1 5 9 13 SB 2 6 10 14 2 6 10 14 3 7 11 15 3 7 11 15 𝑩 𝑪 0 4 8 12 0 4 8 12 1 5 9 13 1 5 9 13 SR ARK 2 6 10 14 2 6 10 14 3 7 11 15 3 7 11 15 𝑫 𝑬 0 4 8 12 1 5 9 13 2 6 10 14 3 7 11 15 𝑳 𝑶𝒔
Giraud’s Attack 96 𝜻 𝜻′ 0 4 8 12 0 4 8 12 1 5 9 13 1 5 9 13 SB 2 6 10 14 2 6 10 14 3 7 11 15 3 7 11 15 𝑩 𝑪 0 4 8 12 0 4 8 12 1 5 9 13 1 5 9 13 SR ARK 2 6 10 14 2 6 10 14 3 7 11 15 3 7 11 15 𝑫 𝑬 0 4 8 12 1 5 9 13 2 6 10 14 3 7 11 15 𝑳 𝑶𝒔
Giraud’s Attack 97 𝜻 𝜻′ 0 4 8 12 0 4 8 12 1 5 9 13 1 5 9 13 SB 2 6 10 14 2 6 10 14 3 7 11 15 3 7 11 15 𝑩 𝑪 𝜻′ 0 4 8 12 0 4 8 12 1 5 9 13 1 5 9 13 SR ARK 2 6 10 14 2 6 10 14 3 7 11 15 3 7 11 15 𝑫 𝑬 0 4 8 12 1 5 9 13 2 6 10 14 3 7 11 15 𝑳 𝑶𝒔
Giraud’s Attack 98 𝜻 𝜻′ 0 4 8 12 0 4 8 12 1 5 9 13 1 5 9 13 SB 2 6 10 14 2 6 10 14 3 7 11 15 3 7 11 15 𝑩 𝑪 𝜻′ 𝜻′ 0 4 8 12 0 4 8 12 1 5 9 13 1 5 9 13 SR ARK 2 6 10 14 2 6 10 14 3 7 11 15 3 7 11 15 𝑫 𝑬 0 4 8 12 1 5 9 13 2 6 10 14 3 7 11 15 𝑳 𝑶𝒔
Giraud’s Attack 99 𝜻 𝜻′ 0 4 8 12 0 4 8 12 1 5 9 13 1 5 9 13 SB 2 6 10 14 2 6 10 14 3 7 11 15 3 7 11 15 𝑩 𝑪 𝜻′ 𝜻′ 0 4 8 12 0 4 8 12 1 5 9 13 1 5 9 13 SR ARK 2 6 10 14 2 6 10 14 3 7 11 15 3 7 11 15 𝑫 𝑬 0 4 8 12 1 5 9 13 2 6 10 14 3 7 11 15 𝑳 𝑶𝒔
Giraud’s Attack 100 • Pre-compile the table For each 𝒘𝒃𝒎 = (0𝑦00: 0𝑦𝐺𝐺) of the byte For each fault 𝜻 = (0𝑦01,0𝑦02,0𝑦04,0𝑦08,0𝑦10,0𝑦20,0𝑦40,0𝑦80) Compute 𝜠 = 𝑇𝑣𝑐𝐶𝑧𝑢𝑓𝑡(𝑤𝑏𝑚) ⊕ 𝑇𝑣𝑐𝐶𝑧𝑢𝑓𝑡(𝑤𝑏𝑚 ⊕ 𝜁) • For each fault, looking for 𝒘𝒃𝒎 where 𝜻 ′ = 𝜠 provides 8 entries in average • 3 faults on one byte allows to identify the correct 𝒘𝒃𝒎 of the state • 𝑳𝒇𝒛 = 𝑑𝑗𝑞ℎ𝑓𝑠𝑢𝑓𝑦𝑢 ⊕ 𝑇𝑣𝑐𝐶𝑧𝑢𝑓𝑡(𝑤𝑏𝑚) • The sequence must be repeated for each byte
Recommend
More recommend