Guarding user Privacy with Federated Learning and Differential - PowerPoint PPT Presentation

Guarding user Privacy with Federated Learning and Differential Privacy Brendan McMahan mcmahan@google.com DIMACS/Northeast Big Data Hub Workshop on Overcoming Barriers to Data Sharing including Privacy and Fairness 2017.10.24

Our Goal Imbue mobile devices with Federated state of the art machine learning Learning systems without centralizing data and with privacy by default.

Our Goal Imbue mobile devices with Federated state of the art machine learning Learning systems without centralizing data and with privacy by default. A very personal computer 2015: 79% away from phone ≤2 hours/day 1 63% away from phone ≤1 hour/day 25% can't remember being away at all 2013: 72% of users within 5 feet of phone most of the time 2 . Plethora of sensors Innumerable digital interactions 1 2015 Always Connected Research Report, IDC and Facebook 2 2013 Mobile Consumer Habits Study, Jumio and Harris Interactive.

Our Goal Imbue mobile devices with Federated state of the art machine learning Learning systems without centralizing data and with privacy by default. Deep Learning non-convex millions of parameters complex structure (eg LSTMs)

Our Goal Imbue mobile devices with Federated state of the art machine learning Learning systems without centralizing data and with privacy by default. Distributed learning problem Horizontally partitioned Nodes: millions to billions Dimensions: thousands to millions Examples: millions to billions

Our Goal Imbue mobile devices with Federated state of the art machine learning Learning systems without centralizing data and with privacy by default. Federated decentralization facilitator

Deep Learning, the short short version 0 0.5 Is it 5? 0.5 0.9 1 1 1 f (input, parameters) = output

Deep Learning, the short short version 0 0.5 Is it 5? 0.5 0.9 1 1 1 f (input, parameters) = output loss (parameters) = 1/n ∑ i difference( f (input i , parameters), desired i )

Deep Learning, the short short version 0 0.5 Is it 5? 0.5 0.9 1 1 1 Adjust these f (input, parameters) = output loss (parameters) = 1/n ∑ i difference( f (input i , parameters), desired i ) to minimize this

Deep Learning, the short short version Stochastic Choose a random subset of training data Gradient Compute the "down" direction on the loss function Descent Take a step in that direction (Rinse & repeat) f (input, parameters) = output loss (parameters) = 1/n ∑ i difference( f (input i , parameters), desired i )

Cloud-centric ML for Mobile

The model lives in the cloud. Current Model Parameters

We train models in the cloud. training data

Mobile Device Current Model Parameters

Make predictions in the cloud. t s e u q e r n o i t c i d e r p

Gather training data in the cloud. t s e u q e r n o i t c i d e r p training data

And make the models better. training data

On-Device Predictions (Inference)

Instead of making predictions in the cloud t s e u q e r n o i t c i d e r p

Distribute the model, make predictions on device.

1 On-device inference On-Device Inference User Advantages Low latency ● ● Longer battery life Less wireless data transfer ● ● Better offline experience Less data sent to the cloud ● Developer Advantages ● Data is already localized ● New product opportunities World Advantages ● Raise privacy expectations for the industry

1 On-device training On-Device Inference User Advantages Bringing Low latency ● model training ● Longer battery life onto mobile devices. Less wireless data transfer ● ● Better offline experience Less data sent to the cloud ● (training data stays on device) Developer Advantages ● Data is already localized ● New product opportunities ● Straightforward personalization ● Simple access to rich user context World Advantages ● Raise privacy expectations for the industry

1 On-device training On-Device Inference User Advantages Bringing Low latency ● model training ● Longer battery life onto mobile devices. Less wireless data transfer ● ● Better offline experience Less data sent to the cloud ● (training data stays on device) 2 Developer Advantages Federated Learning ● Data is already localized ● New product opportunities ● Straightforward personalization ● Simple access to rich user context World Advantages ● Raise privacy expectations for the industry

Federated Learning

Federated Learning Federated Learning is the problem of training a shared global model under the coordination of a central server, from a federation of participating devices which maintain control of their own data. 2 Federated Learning

Federated Learning Mobile Device Local Cloud Training Service Data Provider Current Model Parameters

Federated Learning Mobile Device Many devices will be offline. Local Cloud Training Service Data Provider Current Model Parameters

Federated Learning Mobile Device 1. Server selects Local a sample of e.g. Training 100 online Data devices. Current Model Parameters

Federated Learning 2. Selected devices download the current model parameters.

Federated Learning 3. Users compute an update using local training data

Federated Learning ∑ 4. Server aggregates users' updates into a new model. Repeat until convergence.

Applications of federating learning What makes a good application? Example applications On-device data is more relevant Language modeling (e.g., next ● ● than server-side proxy data word prediction) for mobile keyboards ● On-device data is privacy sensitive or large Image classification for predicting ● which photos people will share ● Labels can be inferred naturally ● ... from user interaction

… or, why this isn't just Challenges of Federated Learning "standard" distributed optimization Massively Distributed Training data is stored across a very large number of devices Limited Communication Only a handful of rounds of unreliable communication with each devices Unbalanced Data Some devices have few examples, some have orders of magnitude more Highly Non-IID Data Data on each device reflects one individual's usage pattern Unreliable Compute Nodes Devices go offline unexpectedly; expect faults and adversaries Dynamic Data Availability The subset of data available is non-constant, e.g. time-of-day vs. country

The Federated Averaging algorithm Server Until Converged: 1. Select a random subset (e.g. 100) of the (online) clients 2. In parallel, send current parameters θ t to those clients Selected Client k 1. Receive θ t from server. θ' 2. Run some number of minibatch SGD steps, producing θ' θ t 3. Return θ'-θ t to server. H. B. McMahan, et al . 3. θ t+1 = θ t + data-weighted average of client updates Communication-Efficient Learning of Deep Networks from Decentralized Data. AISTATS 2017

Large-scale LSTM for next-word prediction Rounds to reach 10.5% Accuracy FedSGD 820 FedAvg 35 23x decrease in communication rounds Model Details 1.35M parameters 10K word dictionary embeddings ∊ℝ 96 , state ∊ℝ 256 corpus: Reddit posts, by author

CIFAR-10 convolutional model Updates to reach 82% SGD 31,000 FedSGD 6,600 FedAvg 630 49x decrease in communication (updates) vs SGD (IID and balanced data)

Federated Learning & Privacy

Federated Learning ∑ 4. Server aggregates users' updates into a new model. Repeat until convergence.

Might these Federated Learning updates contain privacy-sensitive data? ∑

Might these updates contain privacy-sensitive data?

Might these updates contain privacy-sensitive data? 1. Ephemeral

Might these updates contain privacy-sensitive data? Improve privacy & 1. Ephemeral security by minimizing the 2. Focussed "attack surface"

Might these updates contain privacy-sensitive data? 1. Ephemeral 2. Focussed ∑ 3. Only in aggregate

Wouldn't it be even better if ... ∑ Google aggregates users' updates, but cannot inspect the individual updates.

A novel, practical protocol ∑ Google aggregates users' updates, but cannot inspect K. Bonawitz, et.al. Practical the individual updates. Secure Aggregation for Privacy-Preserving Machine Learning. CCS 2017.

Might the final model memorize a user's data? 1. Ephemeral 2. Focussed ∑ 3. Only in aggregate 4. Differentially private

Differential Privacy ∑

Differential Privacy Differential Privacy (trusted aggregator) ∑ +

Federated Averaging Server Until Converged: 1. Select a random subset (e.g. C=100) of the (online) clients 2. In parallel, send current parameters θ t to those clients Selected Client k 1. Receive θ t from server. θ' 2. Run some number of minibatch SGD steps, producing θ' θ t 3. Return θ'-θ t to server. 3. θ t+1 = θ t + data-weighted average of client updates

Guarding user Privacy with Federated Learning and Differential - PowerPoint PPT Presentation

Guarding user Privacy with Federated Learning and Differential Privacy Brendan McMahan mcmahan@google.com DIMACS/Northeast Big Data Hub Workshop on Overcoming Barriers to Data Sharing including Privacy and Fairness 2017.10.24 Our Goal Imbue

Federated Learning Min Du Postdoc, UC Berkeley Outline q Preliminary: deep learning and SGD q

Altitude Terrain Guarding and Guarding Uni-Monotone Polygons Stephan Friedrichs Valentin

Differentially-Private Federated Linear Bandits Introduction Federated Learning Contextual

Fair Resource Allocation in Federated Learning Tian Li (CMU) , Maziar Sanjabi (Facebook AI), Ahmad

Data privacy: Privacy models Vicen c Torra March, 2019 Hamilton Institute, Maynooth

CS6100: Topics in Design and Analysis of Algorithms Guarding and Triangulating Polygons John

Computational Geometry Triangulations and Guarding Art Galleries Michael T. Goodrich with

Triangulations and Guarding Art Galleries Carola Wenk 1/29/15 1 CMPS 3130/6130 Computational

June 23, 2017 SAFE GUARDING THOUGHTS ON THE JOURNEY Steven Mueller, CRSP WPAC Safety

Edge-guarding Orthogonal Polyhedra Giovanni Viglietta Department of Computer Science, University

Edge Guarding Plane Graphs March 17, 2020 Paul Jungeblut, Torsten Ueckerdt I NSTITUTE OF T

Analyzing Federated Learning through an Adversarial Lens Arjun Nitin Bhagoji 1 , Supriyo

Federated Machine Learning via Over-the-Air Computation Yuanming Shi ShanghaiTech University 1

CS305 Topic Privacy Concept Evolution Rights to Privacy Privacy and Technologies

Privacy Protection privacy notions and metrics; privacy in RFID systems; location privacy in

$ Lesson Fourteen Consumer Privacy 04/09 privacy and information information privacy: privacy

Early Media Issues draft-sen-sip-earlymedia-00.txt MD H# V MD\ \VKU VKUH HH # QRUW

Improving Spatial Data Processing by Clipping Minimum Bounding Boxes Darius Sidlauskas Sean

Clipping and Culling Sung-Eui Yoon ( ) Course URL:

Fall 2005 6.831

1 Clipping Points Against a View Volume Remember the Plane Equation? Given a point A on the

Extraction of data/primitives inside a region of interest window => Discard (parts of )

Nargis Bibi Barry Cheetham School of Computer Science

The Online Environmental Assessment Form Mapping Tool Presented by: Austin Fisher Fountains

Guarding user Privacy with Federated Learning and Differential - PowerPoint PPT Presentation

Guarding user Privacy with Federated Learning and Differential Privacy Brendan McMahan mcmahan@google.com DIMACS/Northeast Big Data Hub Workshop on Overcoming Barriers to Data Sharing including Privacy and Fairness 2017.10.24 Our Goal Imbue

Federated Learning Min Du Postdoc, UC Berkeley Outline q Preliminary: deep learning and SGD q

Altitude Terrain Guarding and Guarding Uni-Monotone Polygons Stephan Friedrichs Valentin

Differentially-Private Federated Linear Bandits Introduction Federated Learning Contextual

Fair Resource Allocation in Federated Learning Tian Li (CMU) , Maziar Sanjabi (Facebook AI), Ahmad

Data privacy: Privacy models Vicen c Torra March, 2019 Hamilton Institute, Maynooth

CS6100: Topics in Design and Analysis of Algorithms Guarding and Triangulating Polygons John

Computational Geometry Triangulations and Guarding Art Galleries Michael T. Goodrich with

Triangulations and Guarding Art Galleries Carola Wenk 1/29/15 1 CMPS 3130/6130 Computational

June 23, 2017 SAFE GUARDING THOUGHTS ON THE JOURNEY Steven Mueller, CRSP WPAC Safety

Edge-guarding Orthogonal Polyhedra Giovanni Viglietta Department of Computer Science, University

Edge Guarding Plane Graphs March 17, 2020 Paul Jungeblut, Torsten Ueckerdt I NSTITUTE OF T

Analyzing Federated Learning through an Adversarial Lens Arjun Nitin Bhagoji 1 , Supriyo

Federated Machine Learning via Over-the-Air Computation Yuanming Shi ShanghaiTech University 1

CS305 Topic Privacy Concept Evolution Rights to Privacy Privacy and Technologies

Privacy Protection privacy notions and metrics; privacy in RFID systems; location privacy in

$ Lesson Fourteen Consumer Privacy 04/09 privacy and information information privacy: privacy

Early Media Issues draft-sen-sip-earlymedia-00.txt MD H# V MD\ \VKU VKUH HH # QRUW

Improving Spatial Data Processing by Clipping Minimum Bounding Boxes Darius Sidlauskas Sean

Clipping and Culling Sung-Eui Yoon ( ) Course URL:

Fall 2005 6.831

1 Clipping Points Against a View Volume Remember the Plane Equation? Given a point A on the

Extraction of data/primitives inside a region of interest window =&gt; Discard (parts of )

Nargis Bibi Barry Cheetham School of Computer Science

The Online Environmental Assessment Form Mapping Tool Presented by: Austin Fisher Fountains

Extraction of data/primitives inside a region of interest window => Discard (parts of )