Amir Ali Kouzeh Geran and Arash Reyhani-Masoleh Presented by: Arash - PowerPoint PPT Presentation

Amir Ali Kouzeh Geran and Arash Reyhani-Masoleh Presented by: Arash Reyhani-Masoleh Department of Electrical and Computer Engineering Western University, London, Ontario, Canada 23rd IEEE Symposium on Computer Arithmetic (ARITH 23) June 11, 2016

Outline  Motivation  Preliminaries  Single-bit Fault Detection Scheme  CRC-based Fault Detection Scheme  Fault Simulation Results  FPGA Implementations and Overheads  Conclusion 2

Motivations: GCM  Galois/Counter Mode (GCM) is a recently adopted mode of operation for symmetric key cryptography (like AES).  Proposed by McGrew and Viega in 2005 and was defined by NIST (SP 800-38D) in 2007.  AES-GCM is included in “ NSA Suite B Cryptography ”.  It is being used in a number of protocols and standards:  IEEE 802.1AE, IEEE 802.11 AD  ANSI (INCITS) Fiber Channel Security Protocols ( FC-SP ).  IEEE P1619.1 tape storage, IETF IPsec standards, SSH and TLS 1.2.  It provides authentication assurance for additional data that is not encrypted.  It detects accidental modifications of data, unauthorized alterations, and protects confidentiality . 3

Motivations: Reliable GCM  Sources of faults in cryptographic systems:  Natural Faults  Fault Attacks: inject faults and look for leakage of information.  The need for fault detection method  Protect the integrity and authenticity of data  Prevent the attack sequence in case of fault attack.  In this paper, we propose a reliable GCM scheme to detect both permanent and transient faults.  Low overhead in terms of area and delay.  Acceptable fault coverage. 4

Preliminaries  The GCM has two operations: authenticated encryption and authenticated decryption .  There are 4 inputs for authenticated encryption: A secret key ( K ) with the length based on the block cipher. 1. An initialization vector ( IV ) between 1 and 2 64 . 2. A plaintext ( P ) with any number of bits between 0 and 2 39 − 256 3. An additional authenticated data ( A ), which is authenticated but 4. not encrypted, with any number of bits between 0 and 2 64 .  There are two outputs for authenticated encryption: A ciphertext ( C ) whose length is exactly that of the plaintext. 1. An authentication tag ( T ), whose length can be any value 2. between 0 and 128. 5

AES-GCM Block Diagram • The “Hash Key” H is generated by the encryption of 128 bits of zero using the symmetric key ( K ): H = E(K,0 128 )=E K (0) • The Additional Authenticated Data A is represented as m blocks of 128 bits: A 1 , A 2 , . . . , A m • The Plaintext P is divided into n blocks of 128-bit long: P 1 , P 2 , . . . , P n • An up-counter with the output U i is used to generate blocks of ciphertext: C i =P i ⊕ E K (U i ) for i =1, 2, …, n. 6

AES-GCM Block Diagram (cont.) • Using the inputs H, A and C , the output of the GCM is defined by X m+n+1 = GHASH (H, A, C), where • The 128-bit register Y • Cleared initially. • After the (m+n+1)th clock cycle, it contains X m+n+1 = GHASH (H, A, C) . • In this paper, we consider the GCM loop. 7

Single-bit Fault Detection Scheme  The parity of multiplier output ( X i ) is computed using two different functions: Actual parity ( p Xi ) is obtained by XORing the 1. coordinates of X i Then, they are compared to find error: if 𝑞 ≠ Ƹ 𝑞 ⇒ e out =1. 2. The predicted parity is a X i  ˆ p f ( H , C , Y ). complex function of H, C i , Y: i 8

Single-bit Parity Prediction Formulations We write the multiplier output as follows:  𝑌 𝑗 = 𝐼 × 𝐸 𝑗 mod 𝐺(α) , where α is the root of irreducible polynomial F(x)=x 128 + x 7 + x 2 + x + 1 and 0 ≤ 𝑗 ≤ 𝑛 + 𝑜 + 1 .  The hash key 𝐼 ∈ GF(2 128 ) is fixed in each iterations 𝑗. 127 𝑒 𝑘 α 𝑘 (drop 𝑗 for simplicity).  The field element 𝐸 𝑗 = σ 𝑘=0 𝑘 = (𝐼 α 𝑘 )mod 𝐺(α) , Z (0) =H . 127 𝑒 𝑘 𝑎 (𝑘) , where 𝑎  𝑌 𝑗 = σ 𝑘=0 Then, the parity prediction of multiplier output: 127   ˆ ˆ p d p . X j ( j ) Z i  j 0 9

Single-bit Parity Prediction Formulations (Cont.) 127   ˆ ˆ p d p ( j ) X j Z i  j 0 127 127     ˆ ˆ ˆ Since 𝐸 = 𝑍 + 𝐷 ⇒ d j =y j +c j p y p c p ⇒ X j ( j ) j ( j ) Z Z i   j 0 j 0 • ˆ , 0 ≤ 𝑘 ≤ 127 , is a binary function and depends on p ( j ) Z the coordinates of 𝐼 ∈ 𝐻𝐺 2 128 : 0 = 𝐼  ˆ • 𝑎 ⇒ p p . ( 0 ) H 1 = 𝑎 0 α mod 𝐺 α ⇒ Z   ˆ ˆ • 𝑎 p p h . ( 1 ) ( 0 ) 127 Z Z      ˆ ˆ ( j 1 ) • In general: p p z for 1 j 127 .  ( j ) ( j 1 ) 127 Z Z • These values are stored in a register (PH) at the initialization phase. • They remain constant for the entire 𝑛 + 𝑜 + 1 cycles of the GCM computation. 10

Single Parity Fault Detection Architecture • The actual and predicted parities are computed and compared in each clock cycle to generate the output error signal. 127 127     ˆ ˆ ˆ p y p c p . X j ( j ) j ( j ) Z Z i   j 0 j 0 11

Ƹ CRC-Based Fault Detection Scheme • We extend the idea from single bit to multiple bits. • The Cyclic Redundancy Check (CRC) code has been adopted to detect errors in the GCM loop. • For 𝑙 parity bits, the CRC generator polynomial must be of degree 𝑙: 𝑕 𝑙 𝑦 = 𝑦 𝑙 + … + 𝑕 1 𝑦 + 1. • Let us denote the output of the multiplier in the GCM loop as the message: 𝑛 𝑦 = 𝑌 i (𝑦) 1. Compute actual k-bit parity: 𝑞 𝑦 = 𝑛 𝑦 𝑛𝑝𝑒 𝑕 k (𝑦) 2. Compute k-bit predicted parity: 𝑞 𝑦 = 𝑔 𝐷, 𝐼, 𝑍 . 3. Compare them to detect error: if 𝑞 𝑦 ≠ Ƹ 𝑞 𝑦 ⇒ e out =1. 12

Matrix-Based CRC Formulations 1. The k parity bits of the multiplier output are computed as p CRC-k = [ p 0 p 1 … p k-1 ] = [ m 0 m 1 … m 127 ] G CRC-k. • m j ∈ {0 ,1} is the j -th coordinate of the multiplier output 𝑌 𝑗 . • G CRC-k is the 128 × 𝑙 CRC generator matrix. • The 𝑘 -th row, 0 ≤ 𝑘 ≤ 127, of G CRC-k contain coefficients of 𝑦 𝑘 𝑛𝑝𝑒 𝑕 k 𝑦 . • For 𝑙 = 1 (single bit parity), 𝑕 1 𝑦 = 𝑦 + 1 and then G CRC-1 = [ 1 1 … 1 ] T ⇒ p=m 0 +m 1 +…+m 127 • For 2 ≤ 𝑙 ≤ 4 ⇒ 13

Matrix-Based CRC Formulations (cont.) 2. To calculate k predicted parity bits, we use the Mastrovito formulation for the multiplier output as m =[ m 0 m 1 … m 127 ] T = Ed • The entries of E contain coordinates of 𝐼 only. • d=y+c is a vector with the coordinates of 𝐸 𝑗 = 𝑍 𝑗 + 𝐷 i • Substituting m T = d T E T into p CRC-k = m T G CRC-k , we obtain 𝒒 CRC-k = [ Ƹ ෝ 𝑞 0 Ƹ 𝑞 1 … Ƹ 𝑞 k-1 ] = d T E T G CRC-k = y T O CRC-k + c T O CRC-k • The entries of O CRC-k = E T G CRC-k are functions of 𝐼 only. • They are stored into k 128 -bit registers at the initialization phase. 14

Matrix-Based CRC Formulations (cont.) 3. After calculations of [ p 0 p 1 … p k-1 ] and [ Ƹ 𝑞 0 Ƹ 𝑞 1 … Ƹ 𝑞 k-1 ], we compare all 𝑙 actual parities with the corresponding predicted parities to generate the output error signal e out = ( p 0 + Ƹ 𝑞 1 ) ∨ … ∨ ( p k-1 + Ƹ 𝑞 0 ) ∨ ( p 1 + Ƹ 𝑞 k-1 ) • It requires 𝑙 2-input XOR gates and a k- input OR gate. 15

CRC-Based Fault Detection Architecture • The actual and predicted parities are computed and compared in each clock cycle to generate the output error signal. ෝ 𝒒 CRC-k = [ Ƹ 𝑞 0 Ƹ 𝑞 1 … Ƹ 𝑞 k-1 ]= y T O CRC-k + c T O CRC-k e out =( p 0 + Ƹ 𝑞 1 ) ∨ … ∨ ( p k-1 + Ƹ 𝑞 0 ) ∨ ( p 1 + Ƹ 𝑞 k-1 ) p CRC-k = [ p 0 p 1 … p k-1 ] = [ m 0 m 1 … m 127 ] G CRC-k 16

Fault Simulation Results • We have written a VHDL code to simulate the entire fault detection scheme for the GCM using ModelSim. • We have considered up to degree six for the CRC generator polynomials. • Different cases of single and multiple bit faults (300,000 in total) are injected into different modules of the proposed fault detection architecture. • By increasing number of parity bits, fault coverage increases and can reach to 100% with acceptable false alarm. 17

FPGA Implementations and Overheads • We have implemented the original GCM and six fault detection architectures on Altera’s 28 nm FPGA. • Their areas in terms number of ALM (Adaptive Logic Module) and longest delays are recorded. • The area and time overheads of the fault detection schemes are presented as compared to the original one. • For fault coverage of 98% (k=6), we have area overhead of 10.9% and delay of 23%. 18

Conclusion  We proposed a reliable GCM scheme capable of detecting permanent and transient faults.  The proposed fault detection scheme checks the validity of the GCM computation in every clock cycle.  Based on available overheads and/or required fault coverage, number of parity bits (and hence the CRC generator polynomial) can be selected.  We performed fault simulation and FPGA implementations  We considered single and multiple faults in all locations of the GCM, parity generation and predicted modules.  The proposed fault detection scheme has high fault coverage with low overheads and negligible false alarm. 19

Thank You & Questions? 20

Amir Ali Kouzeh Geran and Arash Reyhani-Masoleh Presented by: Arash - PowerPoint PPT Presentation

Amir Ali Kouzeh Geran and Arash Reyhani-Masoleh Presented by: Arash Reyhani-Masoleh Department of Electrical and Computer Engineering Western University, London, Ontario, Canada 23rd IEEE Symposium on Computer Arithmetic (ARITH 23) June 11,

Smashing the Implementation Records of AES S-box Arash Reyhani-Masoleh, Mostafa Taha, and Doaa

Normal Basis is Usin ing Novel Concurrent Seria ial Squarin ing and Mult ltipli lication

Spark Machine Learning Amir H. Payberah amir@sics.se SICS Swedish ICT June 30, 2016 Amir H.

Spark Streaming and GraphX Amir H. Payberah amir@sics.se SICS Swedish ICT Amir H. Payberah

IEEE P2P'10 Conference Amir H. Payberah amir@sics.se Amir H. Payberah 20 Oct. 2010 1/44

An Introduction to Apache Spark Amir H. Payberah amir@sics.se SICS Swedish ICT Amir H. Payberah

Gossip Peer Sampling in Real World Amir H. Payberah (amir@sics.se) Amir H. Payberah 22 June

Processing Massive Graphs Amir H. Payberah amir.payberah@cs.ox.ac.uk University of Oxford Amir

P2P Content Distribution BitTorrent and Spotify Amir H. Payberah amir@sics.se Amirkabir

Data Intensive Computing Frameworks Amir H. Payberah amir@sics.se Amirkabir University of

Towards Efficient Query Processing on Massive Time-Evolving Graphs Arash Fard, Amir Abdolrashidi,

the Blockchain Krishnendu Chatterjee, Amir Goharshady , Arash Pourdamghani** *IST Austria,

Extreme Value Analysis Amir AghaKouchak Email: amir.a@uci.edu Web:

WATERSHED MODELING Amir AghaKouchak amir.a@uci.edu Course

Stack Traces in Haskell Arash Rouhani Chalmers University of Technology Master thesis

Hello Alexa, Im Drupal Arash Farazdaghi Builder Track \

Data Link Layer Understand principles behind data link layer services: Error detection,

Precise Neutron Lifetime Measurement Using Pulsed Neutron Beams at J-PARC Motivation 8.4 sec

CTSA S AS CATALYSTS OF TRANSLATION: THE PUBLIC IMAGE Olga Brazhnik, Ph.D. Division of Clinical

* * * * Hudson Creek Doctortown (USGS) Marsh Landing * Record of: * Salinity structure

CS7015 (Deep Learning) : Lecture 5 Gradient Descent (GD), Momentum Based GD, Nesterov Accelerated

compsci 514: algorithms for data science Cameron Musco University of Massachusetts Amherst.

SeparatingThickness fromGeometricThickness DavidEppstein

LAG: Lazily Aggregated Gradient for Communication-Efficient Distributed Learning Tianyi Chen

Amir Ali Kouzeh Geran and Arash Reyhani-Masoleh Presented by: Arash - PowerPoint PPT Presentation

Amir Ali Kouzeh Geran and Arash Reyhani-Masoleh Presented by: Arash Reyhani-Masoleh Department of Electrical and Computer Engineering Western University, London, Ontario, Canada 23rd IEEE Symposium on Computer Arithmetic (ARITH 23) June 11,

Smashing the Implementation Records of AES S-box Arash Reyhani-Masoleh, Mostafa Taha, and Doaa

Normal Basis is Usin ing Novel Concurrent Seria ial Squarin ing and Mult ltipli lication

Spark Machine Learning Amir H. Payberah amir@sics.se SICS Swedish ICT June 30, 2016 Amir H.

Spark Streaming and GraphX Amir H. Payberah amir@sics.se SICS Swedish ICT Amir H. Payberah

IEEE P2P'10 Conference Amir H. Payberah amir@sics.se Amir H. Payberah 20 Oct. 2010 1/44

An Introduction to Apache Spark Amir H. Payberah amir@sics.se SICS Swedish ICT Amir H. Payberah

Gossip Peer Sampling in Real World Amir H. Payberah (amir@sics.se) Amir H. Payberah 22 June

Processing Massive Graphs Amir H. Payberah amir.payberah@cs.ox.ac.uk University of Oxford Amir

P2P Content Distribution BitTorrent and Spotify Amir H. Payberah amir@sics.se Amirkabir

Data Intensive Computing Frameworks Amir H. Payberah amir@sics.se Amirkabir University of

Towards Efficient Query Processing on Massive Time-Evolving Graphs Arash Fard, Amir Abdolrashidi,

the Blockchain Krishnendu Chatterjee*, Amir Goharshady *, Arash Pourdamghani** *IST Austria,

Extreme Value Analysis Amir AghaKouchak Email: amir.a@uci.edu Web:

WATERSHED MODELING Amir AghaKouchak amir.a@uci.edu Course

Stack Traces in Haskell Arash Rouhani Chalmers University of Technology Master thesis

Hello Alexa, Im Drupal Arash Farazdaghi Builder Track \

Data Link Layer Understand principles behind data link layer services: Error detection,

Precise Neutron Lifetime Measurement Using Pulsed Neutron Beams at J-PARC Motivation 8.4 sec

CTSA S AS CATALYSTS OF TRANSLATION: THE PUBLIC IMAGE Olga Brazhnik, Ph.D. Division of Clinical

* * * * Hudson Creek Doctortown (USGS) Marsh Landing * Record of: * Salinity structure

CS7015 (Deep Learning) : Lecture 5 Gradient Descent (GD), Momentum Based GD, Nesterov Accelerated

compsci 514: algorithms for data science Cameron Musco University of Massachusetts Amherst.

SeparatingThickness fromGeometricThickness DavidEppstein

LAG: Lazily Aggregated Gradient for Communication-Efficient Distributed Learning Tianyi Chen

the Blockchain Krishnendu Chatterjee, Amir Goharshady , Arash Pourdamghani** *IST Austria,