FedSel: Federated SGD under Local Differential Privacy with Top-k - PowerPoint PPT Presentation

FedSel: Federated SGD under Local Differential Privacy with Top-k Dimension Selection Ruixuan Liu 1 , Yang Cao 2 , Masatoshi Yoshikawa 2 , Hong Chen 1 1 Renmin University of China, 2 Kyoto University DASFAA, 2020

Federated Learning Overview Sensitive information: age, job, location, etc.

Federated Learning Overview Sensitive information: age, job, location, etc. �

Federated Learning Privacy Vulnerabilities Sensitive information: age, job, location, etc. �

Federated Learning Privacy Vulnerabilities Possible privacy attacks…  Membership Inference “Whether data of a target victim has been used to train a model?”  Reconstruction attack Given a gender classifier, “What a male looks like?”  Unintended inference attack Given a gender classifier, “What is the race of people in Bob’s photos?”

Differential Privacy for Federated Learning Sensitive information: age, job, location, etc. �

Differential Privacy for Federated Learning Sensitive information: age, job, location, etc. � +noise The server adds noises to aggregated updates.

Differential Privacy for Federated Learning Sensitive information: age, job, location, etc. � +noise Requires a trusted server 

Local Differential Privacy for Federated Learning Sensitive information: age, job, location, etc. � +noise +noise +noise +noise No worry about untrusted server 

Local Differential Privacy for Federated Learning Sensitive information: age, job, location, etc. � +noise +noise +noise +noise LDP is a natural privacy definition for FL

Local Differential Privacy for Federated Learning � output … input

Challenges of LDP in Federated Learning [1] Wang N, Xiao X, Yang Y, et al. Collecting and analyzing multidimensional data with local differential privacy[C]//2019 IEEE 35th International Conference on Data Engineering (ICDE). IEEE, 2019: 638-649. For a -dimensional vector, the metric is : • Given a local privacy budget for the vector, • The error in the estimated mean of each dimension If split local privacy budget to d dimensions[1]: • The error is super-linear to , and can be excessive when is large

Challenges of LDP in Federated Learning [1] Wang N, Xiao X, Yang Y, et al. Collecting and analyzing multidimensional data with local differential privacy[C]//2019 IEEE 35th International Conference on Data Engineering (ICDE). IEEE, 2019: 638-649. For a -dimensional vector, the metric is : • Given a local privacy budget for the vector, • The error in the estimated mean of each dimension If split local privacy budget to d dimensions[1]: • The error is super-linear to , and can be excessive when is large An asymptotically optimal conclusion[1]: 1. Random sample dimensions • Increase the privacy budget for each dimension • Reduce the noise variance incurred 2. Perturb each sampled dimension with � 3. Aggregate and scale up by the factor of �

Challenges of LDP in Federated Learning Typical orders-of-magnitude d: 100-1,000,000s dimensions m: 100-1000s users per round : smaller privacy budget = stronger privacy The dimension curse!

Our Intuition Common bottleneck of the dimension curse  Distributed learning Data are partitioned and distributed for accelerating the training process Gradient vectors are transmitted among separate workers Communication costs = bits of representing one real value  Gradient sparsification Reduce communication costs by only transmitting important dimensions  Intuition Dimensions with larger absolute magnitudes are more important => Efficient dimension reduction for LDP

Our Intuition Common focus on selecting Top dimensions Utility / Learning performance Utility / Learning performance Privacy budget Communication resources

Two-stage Framework- FedSel  Top-k dimension selection is data-dependent Server Local vector = Top-k information + value  Average gradient  Update global parameters parameters information  Two-stage framework …  Pull  Update the local  Push noisy vector 𝑡 ∗ accumulated vector Private selection + Value Perturbation Local data ∗ 0 0 0 0 𝑡 � 0 0 0  Sequential Composition 𝑠 𝑡 � 0 0 0 0 0 0 0 � 1 × 𝑒 • The Top-k selection is 𝜗 � -LDP Dimension Selection ValuePerturbation • The value perturbation is 𝜗 � -LDP  Perturb  Calculate Gradients  Select Top-K the selected value with local data dimensions privately • => The mechanism is 𝜗 -LDP, 𝜗 = 𝜗 � + 𝜗 � User 𝑣 �

Two-stage Framework- FedSel  Top-k dimension selection is data-dependent Server Local vector = Top-k information + value  Average gradient  Update global parameters parameters information  Two-stage framework …  Pull  Update the local  Push noisy vector 𝑡 ∗ accumulated vector Private selection + Value Perturbation Local data ∗ 0 0 0 0 𝑡 � 0 0 0  Sequential Composition 𝑠 𝑡 � 0 0 0 0 0 0 0 � 1 × 𝑒 • The Top-k selection is 𝜗 � -LDP Dimension Selection ValuePerturbation • The value perturbation is 𝜗 � -LDP  Perturb  Calculate Gradients  Select Top-K the selected value with local data dimensions privately • => The mechanism is 𝜗 -LDP, 𝜗 = 𝜗 � + 𝜗 � Next goal User 𝑣 �

Methods-Exponential Mechanism (EXP) 1. Sorting and the ranking is denoted with { � , …, � } � 2. Sample unevenly with the probability value 1 3 6 2 4 5

Methods-Exponential Mechanism (EXP) 1. Sorting and the ranking is denoted with { � , …, � } � 2. Sample unevenly with the probability probability value 1 3 6 1 2 3 4 5 6 2 4 5

Methods-Perturbed Encoding Mechanism (PE) 1. Sorting and the ranking is denoted the Top-k status with { � , …, � } � 2. For each dimension, to retain status � with a larger probability to flip � has a smaller probability 3. Sample from dimension set ∗ � value 1 3 6 2 4 5

Methods-Perturbed Sampling Mechanism (PS) 1. Sorting and the ranking is denoted the Top-k status with { � , …, � } � 2. Sample a dimension from: Top-k dimension set, with a larger probability Non-top dimension set, with a smaller probability value 1 3 6 2 4 5

Empirical results Even a small budget in dimension selection helps to increase the learning accuracy • Private Top-k selection helps to improve the learning utility independent of the • mechanism for perturbing one dimension.

Empirical results What we gain is much larger than what we lose from private and efficient Top-k selection

Summary Conclusion We propose a two-stage framework for locally differential private federated SGD • We propose 3 private selection mechanisms for efficient dimension reduction under LDP • Takeaway • Private mechanism can be specialized for sparse vector Private Top-k dimension selection can improve learning utility under a given privacy level • Future work Optimal hyper-parameter tuning •

Thanks - Utility + - Privacy +

FedSel: Federated SGD under Local Differential Privacy with Top-k - PowerPoint PPT Presentation

FedSel: Federated SGD under Local Differential Privacy with Top-k Dimension Selection Ruixuan Liu 1 , Yang Cao 2 , Masatoshi Yoshikawa 2 , Hong Chen 1 1 Renmin University of China, 2 Kyoto University DASFAA, 2020 Federated Learning Overview

Federated Learning Min Du Postdoc, UC Berkeley Outline q Preliminary: deep learning and SGD q

SGD and Averaging Instructor: Sham Kakade 1 SGD and optimality There is a strong sense in which

Stochastic Gradient Descent (SGD) Todays Class Stochastic Gradient Descent (SGD) SGD Recap

CS573 Data Privacy and Security Local Differential Privacy Li Xiong Privacy at Scale: Local

CS573 Data Privacy and Security Differential Privacy Real World Deployments Li Xiong

Toniann Pitassi Outline 1. Differential Privacy: The Basics 2. Differential Privacy in New

Differential Privacy Techniques Beyond Differential Privacy Steven Wu Assistant Professor

From Local SGD to Local Fixed-Point Methods for Federated Learning Laurent Condat King Abdullah

Differential Privacy Li Xiong Outline Differential Privacy Definition Basic techniques

Guarding user Privacy with Federated Learning and Differential Privacy Brendan McMahan

Differential Privacy (Part III) Approximate (or ( , ))-differential privacy

Mobile Data Collection and Analysis with Local Differential Privacy - Part 1 Ninghui Li (Purdue

Differentially-Private Federated Linear Bandits Introduction Federated Learning Contextual

Soft Gamma-ray Polarimetry with ASTRO-H SGD August 23, 2014 HEAPA Symposium on Future

Data privacy: Privacy models Vicen c Torra March, 2019 Hamilton Institute, Maynooth

DIFFERENTIAL AROMA VOL DIFFERENTIAL AROMA VOL DIFFERENTIAL AROMA VOLATILES DIFFERENTIAL AROMA

Entropy & Information Jill illes V s Vreeken 29 29 May 2015 2015 Qu Question o of f

HOST SCA Countermeasures I ECE 525 Side-Channel Analysis (SCA) Countermeasures Reference

Physical layer Encoding data into signals Computer networks Girts Strazdins, gist@ntnu.no, NTNU

Signal Encoding Techniques Digital Data, Analog Signals Analog Data, Digital Signals ITS323:

DATABASE SYSTEM IMPLEMENTATION GT 4420/6422 // SPRING 2019 // @JOY_ARULRAJ LECTURE #2: IN-MEMORY

Differential Fault Analysis against AES-192 and AES-256 with Minimal Faults Chong Hee KIM

Differential Fault Analysis of HC-128 Aleksandar Kircanski and Amr M. Youssef AFRICACRYPT 2010

Fault Attacks on Embedded Software: Threats, Design, and Mitigation Patrick Schaumont Professor

FedSel: Federated SGD under Local Differential Privacy with Top-k - PowerPoint PPT Presentation

FedSel: Federated SGD under Local Differential Privacy with Top-k Dimension Selection Ruixuan Liu 1 , Yang Cao 2 , Masatoshi Yoshikawa 2 , Hong Chen 1 1 Renmin University of China, 2 Kyoto University DASFAA, 2020 Federated Learning Overview

Federated Learning Min Du Postdoc, UC Berkeley Outline q Preliminary: deep learning and SGD q

SGD and Averaging Instructor: Sham Kakade 1 SGD and optimality There is a strong sense in which

Stochastic Gradient Descent (SGD) Todays Class Stochastic Gradient Descent (SGD) SGD Recap

CS573 Data Privacy and Security Local Differential Privacy Li Xiong Privacy at Scale: Local

CS573 Data Privacy and Security Differential Privacy Real World Deployments Li Xiong

Toniann Pitassi Outline 1. Differential Privacy: The Basics 2. Differential Privacy in New

Differential Privacy Techniques Beyond Differential Privacy Steven Wu Assistant Professor

From Local SGD to Local Fixed-Point Methods for Federated Learning Laurent Condat King Abdullah

Differential Privacy Li Xiong Outline Differential Privacy Definition Basic techniques

Guarding user Privacy with Federated Learning and Differential Privacy Brendan McMahan

Differential Privacy (Part III) Approximate (or ( , ))-differential privacy

Mobile Data Collection and Analysis with Local Differential Privacy - Part 1 Ninghui Li (Purdue

Differentially-Private Federated Linear Bandits Introduction Federated Learning Contextual

Soft Gamma-ray Polarimetry with ASTRO-H SGD August 23, 2014 HEAPA Symposium on Future

Data privacy: Privacy models Vicen c Torra March, 2019 Hamilton Institute, Maynooth

DIFFERENTIAL AROMA VOL DIFFERENTIAL AROMA VOL DIFFERENTIAL AROMA VOLATILES DIFFERENTIAL AROMA

Entropy &amp; Information Jill illes V s Vreeken 29 29 May 2015 2015 Qu Question o of f

HOST SCA Countermeasures I ECE 525 Side-Channel Analysis (SCA) Countermeasures Reference

Physical layer Encoding data into signals Computer networks Girts Strazdins, gist@ntnu.no, NTNU

Signal Encoding Techniques Digital Data, Analog Signals Analog Data, Digital Signals ITS323:

DATABASE SYSTEM IMPLEMENTATION GT 4420/6422 // SPRING 2019 // @JOY_ARULRAJ LECTURE #2: IN-MEMORY

Differential Fault Analysis against AES-192 and AES-256 with Minimal Faults Chong Hee KIM

Differential Fault Analysis of HC-128 Aleksandar Kircanski and Amr M. Youssef AFRICACRYPT 2010

Fault Attacks on Embedded Software: Threats, Design, and Mitigation Patrick Schaumont Professor

Entropy & Information Jill illes V s Vreeken 29 29 May 2015 2015 Qu Question o of f