Updates-Leak: Data Set Inference and Reconstruction Attacks in Online Learning Ahmed Salem , Apratim Bhattacharya, Michael Backes Mario Fritz,Yang Zhang CISPA Helmholtz Center for Information Security, Max Planck Institute for Informatics � 1
Online Learning Training set • Data generation rate • 90% of the data in the world Train today has been created in Model the last two years alone • Cost of retraining e t a d p U Updating set 2
Attack Surface in Online Learning 70 50 25 35 0 0 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 Research Question: Can this posterior ? Target Model Update di ff erence be a new attack surface? 50 70 25 35 0 0 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 3
Threat Model • Attacker has black-box access to the target model • Attacker knows: • Target model’s architecture • A shadow dataset from the same distribution of the target model’s dataset 4
General Attack Pipeline 50 Probing set 25 0 Single-sample 0 1 2 3 4 5 6 7 8 9 label Inference Attack Model Single-sample reconstruction ? Posterior Encoder Decoder Update Target Model di ff erence Multi-sample label distribution Multi-sample 70 reconstruction Probing set 35 0 0 1 2 3 4 5 6 7 8 9 5
Attack Model Training • Target model’s architecture • Shadow dataset Target Model X Probing Set Y Posterior Shadow Updated di ff erence 1 Model 1 updating set 1 Update . . . . . . Shadow Model . . . updating set n Update Posterior Shadow Updated Model n di ff erence n 6
Single-sample Label Inference 50 Probing set 25 0 0 1 2 3 4 5 6 7 8 9 Attack Model Single-sample It is a 0 ? Posterior Encoder Decoder Update Target Model label Inference di ff erence 70 Probing set 35 • More than 6x and 9x 0 better than baseline for 0 1 2 3 4 5 6 7 8 9 MNIST and CIFAR-10 7
Single-sample Reconstruction 50 Probing set 25 0 0 1 2 3 4 5 6 7 8 9 Attack Model Single-sample ? Posterior Encoder Decoder Update Target Model reconstruction di ff erence • More complicated than inferring 70 label Probing set 35 • Attacker needs a sample 0 0 1 2 3 4 5 6 7 8 9 generator ‣ We rely on autoencoder’s decoder 8
Autoencoder 9
Single-sample Reconstruction Encoder Decoder Transfer Autoencoder Encoder Decoder 10
Single-sample Reconstruction 0 . 10 0 . 035 Autoencoder (Oracle) Autoencoder (Oracle) Mean squared error (MSE) Mean squared error (MSE) 0 . 030 A SSR A SSR 0 . 08 Label-random Label-random 0 . 025 Random Random 0 . 06 0 . 020 0 . 015 0 . 04 0 . 010 0 . 02 0 . 005 0 . 00 0 . 000 MNIST CIFAR-10 11
Multi-sample Label Estimation 50 Probing set 25 0 0 1 2 3 4 5 6 7 8 9 Attack Model Multi-sample ? Posterior Encoder Decoder Update Target Model label distribution di ff erence 0 1 2 3 4 5 6 7 8 9 KL-divergence as the loss 70 Probing set 35 0 0 1 2 3 4 5 6 7 8 9 12
Multi-sample Label Estimation 0 . 05 0 . 12 A LDE A LDE Baseline Baseline 0 . 10 Transfer 100-10 0 . 04 Transfer 10-100 0 . 08 KL-divergence KL-divergence 0 . 03 0 . 06 0 . 02 0 . 04 0 . 01 0 . 02 0 . 00 0 . 00 MNIST (100) CIFAR-10 (100) MNIST (10) CIFAR-10 (10) 13
Multi-sample Reconstruction 50 Probing set 25 0 0 1 2 3 4 5 6 7 8 9 Attack Model Multi-sample ? Posterior Encoder Decoder Update Target Model reconstruction di ff erence 70 • Most challenging scenario in this Probing set attack scenario 35 • Reconstruct a set of data samples 0 0 1 2 3 4 5 6 7 8 9 • Autoencoder cannot help anymore • What we do? 14
Generative Adversarial Network (GAN) Image credit: Thalles Silva 15
Multi-sample Reconstruction Standard Gaussian Noise Discriminator Encoder Generator Best match loss 16
Multi-sample Reconstruction One-to-one match 0 . 06 A MSR Mean squared error (MSE) Baseline 0 . 05 0 . 04 0 . 03 0 . 02 0 . 01 0 . 00 MNIST CIFAR-10 17
Multi-sample Reconstruction 18
Multi-sample Reconstruction 19
Summary 50 Probing set 25 0 0 1 2 3 4 5 6 7 8 9 Attack Model Single-sample Single-sample Multi-sample Multi-sample It is a 0 ? Posterior Encoder Decoder Update Target Model label distribution label Inference reconstruction reconstruction di ff erence 0 1 2 3 4 5 6 7 8 9 70 Probing set Thank you for your attention! 35 Questions? 0 0 1 2 3 4 5 6 7 8 9 ahmed.salem@cispa.saarland https://ahmedsalem2.github.io/ @AhmedGaSalem 20
Recommend
More recommend