The Curse of Class Imbalance and Conflicting Metrics with Machine - PowerPoint PPT Presentation

The Curse of Class Imbalance and Conflicting Metrics with Machine Learning for Side-channel Evaluations Stjepan Picek, Annelie Heuser, Alan Jovic, Shivam Bhasin, and Francesco Regazzoni

Big Picture side-channel measurements device classifier plaintext (training) (training) labels profiled model device side-channel classifier plaintext evaluation metric (attacking) measurements (attacking)

Big Picture template building side-channel measurements device classifier plaintext (training) (training) labels profiled model device side-channel classifier plaintext evaluation metric (attacking) measurements (attacking) success rate   guessing entropy template evaluation +   max likelihood

Big Picture template building ML training side-channel measurements device classifier plaintext (training) (training) labels profiled model device side-channel classifier plaintext evaluation metric (attacking) measurements (attacking) success rate   guessing entropy template evaluation +   accuracy max likelihood ML testing

Big Picture template building ML training side-channel measurements device classifier plaintext (training) (training) labels 1. profiled model device side-channel classifier plaintext evaluation metric (attacking) measurements (attacking) success rate   guessing entropy template evaluation +   accuracy max likelihood 2. ML testing

Labels • typically: intermediate states computed from plaintext and keys • Hamming weight (distance) leakage model commonly used • problem: introduces imbalanced data • for example, occurrences of Hamming weights for all possible 8-bit values:

Why do we use HW? • often does not reflect realistic leakage model

Why do we use HW? • often does not reflect realistic leakage model HW not HW not HW

Why do we use HW? • reduces the complexity of learning • works (su ffi ciently good) in many scenarios for attacking

Why do we care about imbalanced data? • most machine learning techniques rely on loss functions that are “designed” to maximise accuracy • in case of high noise: predicting only HW class 4 gives accuracy of 27% • but is not related to secret key value and therefore does not give any information for SCA

What to do? • in this paper: transform dataset to achieve balancedness? • how? • throw away data • add data • (or choose data before ciphering)

Random under sampling • only keep # of samples equal to the least populated class • binomial distribution: many unused samples Class 1 Class 2 7 samples 13 samples

Random under sampling • only keep # of samples equal to the least populated class • binomial distribution: many unused samples Class 1 Class 2 7 samples 7 samples

Random oversampling with replacement • randomly selecting samples from the original dataset until amount is equal to largest populated • simple method, in other context comparable to other methods Class 1 Class 2 • may happen that some 7 samples 13 samples samples are not selected at all

Random oversampling with replacement • randomly selecting samples 1 2 from the original dataset until 3 amount is equal to largest 2 2 populated 3 0 • simple method, in other context comparable to other methods Class 1 Class 2 • may happen that some “13” samples 13 samples samples are not selected at all

SMOTE • synthetic minority oversampling technique • generating synthetic minority class instances • nearest neighbours are added (corresponding to Euclidean Class 1 Class 2 distance) 7 samples 13 samples

SMOTE • synthetic minority oversampling technique • generating synthetic minority class instances • nearest neighbours are added (corresponding to Euclidean Class 1 Class 2 distance) 13 samples 13 samples

SMOTE+ENN • Synthetic Minority Oversampling Technique with Edited Nearest Neighbor • SMOTE + data cleaning • oversampling + undersampling • removes data samples whose Class 1 Class 2 class di ff erent from multiple 7 samples 13 samples neighbors

SMOTE+ENN • Synthetic Minority Oversampling Technique with Edited Nearest Neighbor • SMOTE + data cleaning • oversampling + undersampling • removes data samples whose Class 1 Class 2 class di ff erent from multiple 10 samples 10 samples neighbors

Experiments • in most experiments SMOTE most e ff ective • data argumentation without any specific knowledge about the implementation / dataset / distribution to balance datasets • varying number of training samples in the profiling phase • Imbalanced: 1k, 10k, 50k • SMOTE: (approx) 5k, 24k, 120k

Dataset 1 • low noise dataset - DPA contest v4 (publicly available) • Atmel ATMega-163 smart card connected to a SASEBO- W board • AES-256 RSM   (Rotating SBox Masking) • in this talk:   mask assumed known

Data sampling techniques • dataset 1: low noise unprotected

Dataset 2 • high noise dataset • AES-128 on Xilinx Virtex-5 FPGA of a SASEBO GII evaluation board. • publicly available on github:   https://github.com/ AESHD/AES HD Dataset

Data sampling techniques • dataset 2: high noise unprotected

Dataset 3 • AES-128: Random delay countermeasure => misaligned • 8-bit Atmel AVR microcontroller • publicly available on github: https:// github.com/ ikizhvatov/ randomdelays-traces

Data sampling techniques • dataset 3: high noise with random delay

Further results • additionally we tested SMOTE for CNN, MLP , TA: • also beneficial for CNN and MLP • not for TA (in this settings): • is not “tuned” regarding accuracy • may still benefit if #measurements is too low to build stable profiles (lower #measurements for profiling) • in case available: perfectly “natural”/chosen balanced dataset leads to better performance • … more details in the paper

Big Picture template building ML training side-channel measurements device classifier plaintext (training) (training) labels 1. profiled model device side-channel classifier plaintext evaluation metric (attacking) measurements (attacking) success rate   guessing entropy template evaluation +   accuracy max likelihood 2. ML testing

Evaluation metrics • ACC: average estimated • SR: average estimated probability of success probability (percentage) of correct classification • GE: average estimated • average is computed secret key rank over number of • depends on the number experiments of traces used in the attacking phase • average is computed over number of experiments

Evaluation metrics • ACC: average estimated • SR: average estimated probability of success probability (percentage) of correct classification No translation • GE: average estimated • average is computed secret key rank over number of • depends on the number experiments of traces used in the attacking phase • average is computed over number of experiments

Evaluation metrics • ACC: average estimated • SR: average estimated probability of success probability (percentage) of correct classification • GE: average estimated • average is computed secret key rank over number of • depends on the number experiments of traces used in the attacking phase indication: if acc high,   GE/SR should "converge quickly” • average is computed over number of experiments

SR/GE vs acc Global acc vs class acc Label vs fixed key prediction • relevant for non-bijective • relevant if attacking with more than 1 trace function between class and key (e.g. class involved the • accuracy: each label is HW) considered independently (along #measurements) • the importance to correctly classify more unlikely values • SR/GE: computed regarding in the class may be more fixed key, accumulated over significant than others #measurements • low accuracy may not indicate • accuracy is averaged over low SR/GE all class values more details, formulas, explanations in the paper…

Take away • HW (HD) + ML is very likely to go wrong in noisy data! • data sampling techniques help to increase performances • more e ff ective to collect less real sample + balancing techniques than collect more imbalanced samples • ML metrics (accuracy) do not give a precise SCA evaluation! ✴ global vs class accuracy ✴ label vs fixed key prediction

The Curse of Class Imbalance and Conflicting Metrics with Machine - PowerPoint PPT Presentation

The Curse of Class Imbalance and Conflicting Metrics with Machine Learning for Side-channel Evaluations Stjepan Picek, Annelie Heuser, Alan Jovic, Shivam Bhasin, and Francesco Regazzoni Big Picture side-channel measurements device classifier

How to Cope with the Curse of Dimensionality ? Henryk Wo zniakowski University of Warsaw and

Non-conflicting and Conflicting Parts of Belief Functions Milan Daniel Institute of Computer

Can Tim or Leste Avoid Can Tim or Leste Avoid the Resource Curse? the Resource Curse? By

. . . 1 / 5 The curse of dimensionality . many applications require high dimensional data .

Equal Sum Sequences and Imbalance Sets of Tournaments Muhammad Ali Khan Center for Computational

PCI Overview of Energy Imbalance Markets in West 1 Webinar Purpose Purpose of Webinar: Provide

Stakeholder telco on single balance single imbalance price model 12.3.2020 Erica Arberg,

AGENCY OPERATIONS METRICS The Metrics of Me The Metrics of Me x 159 13,006 5 days old books

Vandalism Detection on Wikipedia The class imbalance problem & new approaches Paul Gtze

Class Imbalance Learning in Software Defect Prediction Dr. Shuo Wang s.wang@cs.bham.ac.uk

What we learned from Community Metrics Agenda Why are metrics used? How metrics are used

Proposal Metrics Dashboard What Gets Measured Gets Done Topics Why Keep Metrics? What

Performance Metrics for Graph Mining Tasks 1 Outline Introduction to Performance Metrics

Detecting Application Load Imbalance on Cray Systems Heidi Poxon Technical Lead, Performance

Improving Electric fraud detection using class imbalance strategies Eng. Federico Decia Eng.

Conflicting Interests and Language Change GURT 2014 Christopher Ahern & Robin Clark

Unfolding Greeces Potential for Economic Growth: The Role of Start-up & Innovation

European Computer Science Summit 2020 National Associations Workshop Welcome and Introduction

Draft EE 8235: Lecture 27 1 Lecture 27: Optimal control of undirected graphs

Equal opportunities: some comments Jeni Klugman Director, Human Development Report Office Woodrow

Iterative methods: Limits of performance via reachable set analysis Uwe Helmke and Jens Jordan

The publication process Once you paper is finalized, you have to decide where to publish it.

Factorization of density correlation functions for clusters touching the sides of a rectangle

Speakers Robert Andrn Director General for Energy, Ministry of the Environment and Energy Lars

The Curse of Class Imbalance and Conflicting Metrics with Machine - PowerPoint PPT Presentation

The Curse of Class Imbalance and Conflicting Metrics with Machine Learning for Side-channel Evaluations Stjepan Picek, Annelie Heuser, Alan Jovic, Shivam Bhasin, and Francesco Regazzoni Big Picture side-channel measurements device classifier

How to Cope with the Curse of Dimensionality ? Henryk Wo zniakowski University of Warsaw and

Non-conflicting and Conflicting Parts of Belief Functions Milan Daniel Institute of Computer

Can Tim or Leste Avoid Can Tim or Leste Avoid the Resource Curse? the Resource Curse? By

. . . 1 / 5 The curse of dimensionality . many applications require high dimensional data .

Equal Sum Sequences and Imbalance Sets of Tournaments Muhammad Ali Khan Center for Computational

PCI Overview of Energy Imbalance Markets in West 1 Webinar Purpose Purpose of Webinar: Provide

Stakeholder telco on single balance single imbalance price model 12.3.2020 Erica Arberg,

AGENCY OPERATIONS METRICS The Metrics of Me The Metrics of Me x 159 13,006 5 days old books

Vandalism Detection on Wikipedia The class imbalance problem &amp; new approaches Paul Gtze

Class Imbalance Learning in Software Defect Prediction Dr. Shuo Wang s.wang@cs.bham.ac.uk

What we learned from Community Metrics Agenda Why are metrics used? How metrics are used

Proposal Metrics Dashboard What Gets Measured Gets Done Topics Why Keep Metrics? What

Performance Metrics for Graph Mining Tasks 1 Outline Introduction to Performance Metrics

Detecting Application Load Imbalance on Cray Systems Heidi Poxon Technical Lead, Performance

Improving Electric fraud detection using class imbalance strategies Eng. Federico Decia Eng.

Conflicting Interests and Language Change GURT 2014 Christopher Ahern &amp; Robin Clark

Unfolding Greeces Potential for Economic Growth: The Role of Start-up &amp; Innovation

European Computer Science Summit 2020 National Associations Workshop Welcome and Introduction

Draft EE 8235: Lecture 27 1 Lecture 27: Optimal control of undirected graphs

Equal opportunities: some comments Jeni Klugman Director, Human Development Report Office Woodrow

Iterative methods: Limits of performance via reachable set analysis Uwe Helmke and Jens Jordan

The publication process Once you paper is finalized, you have to decide where to publish it.

Factorization of density correlation functions for clusters touching the sides of a rectangle

Speakers Robert Andrn Director General for Energy, Ministry of the Environment and Energy Lars

Vandalism Detection on Wikipedia The class imbalance problem & new approaches Paul Gtze

Conflicting Interests and Language Change GURT 2014 Christopher Ahern & Robin Clark

Unfolding Greeces Potential for Economic Growth: The Role of Start-up & Innovation